Django sticky queryset filters

Sat 10 May 2014

In Django, Stuff.objects.filter(a=1).filter(b=1) is almost always the same as Stuff.objects.filter(a=1, b=1).

Everyone knows and expects this, and it's very well documented.

However, Stuff.objects.filter(rel__a=1).filter(rel__b=1) might not be the same as Stuff.objects.filter(rel__a=1, rel__b=1).

This is also very well documented, but in my option this behavior is not always intuitive.

Lets take an example:

class Tag(models.Model):
    name = models.CharField(max_length=100)

class Entry(models.Model):
    tags = models.ManyToManyField(Tag)

Now if we run Entry.objects.filter(tags__name='stuff') we'd get roughly something like:

SELECT `app_entry`.`id`
FROM `app_entry`
INNER JOIN `app_entry_tags` ON (`app_entry`.`id` = `app_entry_tags`.`entry_id`)
INNER JOIN `app_tag` ON (`app_entry_tags`.`tag_id` = `app_tag`.`id`)
WHERE `app_tag`.`name` = 'stuff'

If we run Entry.objects.filter(tags__name='stuff').filter(tags__name='other') we'd get roughly something like:

SELECT `app_entry`.`id`
FROM `app_entry`
INNER JOIN `app_entry_tags` ON (`app_entry`.`id` = `app_entry_tags`.`entry_id`)
INNER JOIN `app_tag` ON (`app_entry_tags`.`tag_id` = `app_tag`.`id`)
INNER JOIN `app_entry_tags` T4 ON (`app_entry`.`id` = T4.`entry_id`)
INNER JOIN `app_tag` T5 ON (T4.`tag_id` = T5.`id`)
WHERE (`app_tag`.`name` = 'stuff'
       AND T5.`name` = 'other')

Two JOIN is exactly what we wanted - a WHERE with a single JOIN wouldn't make sense anyway.

A different example*

Suppose we want to model books that have multiple authors:

class Author(models.Model):
    nationality = models.CharField(max_length=100)
    sex = models.CharField(max_length=1)
    birth = models.DateField(max_length=100)
    alive = models.BooleanField(default=True)

class Book(models.Model):
    authors = models.ManyToManyField(Author)

What if we want to get all the books that have a female French Author ?

If we run Book.objects.filter(authors__nationality='french', authors__sex='f') then we'd get roughly this query:

SELECT `app_book`.`id`
FROM `app_book`
INNER JOIN `app_book_authors` ON (`app_book`.`id` = `app_book_authors`.`book_id`)
INNER JOIN `app_author` ON (`app_book_authors`.`author_id` = `app_author`.`id`)
WHERE (`app_author`.`nationality` = 'french'
       AND `app_author`.`sex` = 'f')

This is fine. But if we run Book.objects.filter(authors__nationality='french').filter(authors__sex='f') then we get this horrible query:

SELECT `app_book`.`id`
FROM `app_book`
INNER JOIN `app_book_authors` ON (`app_book`.`id` = `app_book_authors`.`book_id`)
INNER JOIN `app_author` ON (`app_book_authors`.`author_id` = `app_author`.`id`)
INNER JOIN `app_book_authors` T4 ON (`app_book`.`id` = T4.`book_id`)
INNER JOIN `app_author` T5 ON (T4.`author_id` = T5.`id`)
WHERE (`app_author`.`nationality` = 'french'
       AND T5.`sex` = 'f')

Not to mention it's wrong it can get quite slow when you have large datasets. People have complained about this but it's not something changeable - there are large amounts of code relying on the current behavior and in simple scenarios it's fine (as it was outlined above).

However, there are some scenarios where you cannot afford using a single filter. Imagine you're using django-filter. Quite hard to collate everything in the same filter call.

Turns out there's a undocumented QuerySet method (_next_is_sticky) that can help us. It has been there since Django 1.0 and it turns out few people are actually using it.

The method mutates the QuerySet to have a sticky flag so that the next two filters combine. You need to call it before every pair of filters you want to be merged in (the sticky flag doesn't persist). E.g.:

Book.objects.all()._next_is_sticky().filter(
    authors__nationality='french'
).filter(
    authors__sex='f'
)

Note the all() call - the Manager (Book.objects) doesn't have the _next_is_sticky method - only the QuerySet does.

Also, note that it works for pairs. Example, this:

Book.objects.all()._next_is_sticky().filter(
    authors__nationality='french'
).filter(
    authors__birth__year=date.today().year - 50
).filter(
    alive=True
).filter(
    sex='f'
)

would generate:

SELECT `app_book`.`id`
FROM `app_book`
INNER JOIN `app_book_authors` ON (`app_book`.`id` = `app_book_authors`.`book_id`)
INNER JOIN `app_author` ON (`app_book_authors`.`author_id` = `app_author`.`id`)
INNER JOIN `app_book_authors` T4 ON (`app_book`.`id` = T4.`book_id`)
INNER JOIN `app_author` T5 ON (T4.`author_id` = T5.`id`)
INNER JOIN `app_book_authors` T6 ON (`app_book`.`id` = T6.`book_id`)
INNER JOIN `app_author` T7 ON (T6.`author_id` = T7.`id`)
WHERE (`app_author`.`nationality` = 'french'
       AND `app_author`.`birth` BETWEEN '1964-01-01' and '1964-12-31'
       AND T5.`alive` = 1
       AND T7.`sex` = 'f')

Note that neither this is correct (it would leave us with two joins):

Book.objects.all()._next_is_sticky().filter(
    authors__nationality='french'
).filter(
    authors__birth__year=date.today().year - 50
)._next_is_sticky().filter(
    authors__alive=True
).filter(
    authors__sex='f'
)

The best practice is to have _next_is_sticky before every filter, like this:

Book.objects.all()._next_is_sticky().filter(
    authors__nationality='french'
)._next_is_sticky().filter(
    authors__birth__year=date.today().year - 50
)._next_is_sticky().filter(
    authors__alive=True
)._next_is_sticky().filter(
    authors__sex='f'
)

When you need the normal behavior you can clear the flag with .all() (e.g.: after you finish up the chain).

This entry was tagged as django python

blog comments powered by Disqus