Merging Django querysets – redux
On my previous post with the same title, frits commented asking wouldn’t it be possible to use filter(tags__in=l) instead of chaining filters.
I’d initially thought that chaining filters would generate a more efficient SQL query. This morning, I decided to test it.
Post.objects.filter(tags__text='python').filter(tags__text='django')
Generates the following SQL query.
SELECT "main_post"."id", "main_post"."link", "main_post"."title"
FROM "main_post"
INNER JOIN "main_post_tags" ON ("main_post"."id" = "main_post_tags"."post_id")
INNER JOIN "main_tag" ON ("main_post_tags"."tag_id" = "main_tag"."id")
INNER JOIN "main_post_tags" T4 ON ("main_post"."id" = T4."post_id")
INNER JOIN "main_tag" T5 ON (T4."tag_id" = T5."id")
WHERE ("main_tag"."text" = 'python' AND T5."text" = 'django' )
Notice that it generates a total of 4 joins, and I don’t really know why its joining on the M2M join table main_post_tags twice. Joining twice on main_tag is ok, but the 2nd join on main_post_tags could’ve been avoided.
Now, to try frits’ idea.
t1 = Tag.objects.get(text='python') t2 = Tag.objects.get(text='django') Post.objects.filter(tags__in=(t1,t2))
This generates a total of 3 queries, 2 to select the tags and one to select the posts. Now, if we ignore the cost of the first two queries (which are cheap, and the results can and should be cached in memory anyways), the final query has become way simpler and cheaper.
SELECT "main_post"."id", "main_post"."link", "main_post"."title"
FROM "main_post"
INNER JOIN "main_post_tags" ON ("main_post"."id" = "main_post_tags"."post_id")
WHERE "main_post_tags"."tag_id" IN (1, 2)
The number of joins is down from 4 to 1, this is way more efficient in every way.
Thanks for the tip, frits.
