Suppose you have a complex Django QuerySet query that is somewhat costly (in other words slow). And suppose you want to return:

  1. The first N results
  2. A count of the total possible results

So your implementation might be something like this:


def get_results(queryset, fields, size):
    count = queryset.count()
    results = []
    for record in queryset.values(*fields)[:size]
        results.append(record)
    return {"count": count, "results": results}

That'll work. If there are 1,234 rows in your database table that match those specific filters, what you might get back from this is:


>>> results = get_results(my_queryset, ("name", "age"), 5)
>>> results["count"]
1234
>>> len(results["results"])
5

Or, if the filters would only match 3 rows in your database table:


>>> results = get_results(my_queryset, ("name", "age"), 5)
>>> results["count"]
3
>>> len(results["results"])
3

Between your Python application and your database you'll see:

query 1: SELECT COUNT(*) FROM my_database WHERE ...
query 2: SELECT name, age FROM my_database WHERE ... LIMIT 5

The problem with this is that, in the latter case, you had to send two database queries when all you needed was one.
If you knew it would only match a tiny amount of records, you could do this:


def get_results(queryset, fields, size):
-   count = queryset.count()
    results = []
    for record in queryset.values(*fields)[:size]:
        results.append(record)
+   count = len(results)
    return {"count": count, "results": results}

But that is wrong. The count would max out at whatever the size is.

The solution is to try to avoid the potentially unnecessary .count() query.


def get_results(queryset, fields, size):
    count = 0
    results = []
    for i, record in enumerate(queryset.values(*fields)[: size + 1]):
        if i == size:
            # Alas, there are more records than the pagination
            count = queryset.count()
            break
        count = i + 1
        results.append(record)
    return {"count": count, "results": results}

This way, you only incur one database query when there wasn't that much to find, but if there was more than what the pagination called for, you have to incur that extra database query.

Comments

Brandon Rhodes

I think the line "found = i + 1" was perhaps intended to read "count = i + 1"?

A question about style: is it common to run explicit "for" loops against Django queries? I might have expected an approach that looks at the whole result instead of building it gradually:

def get_results(queryset, fields, size):
    results = list(queryset.values(*fields)[: size + 1])
    count = len(results)
    if count > size:
        results.pop()
        count = queryset.count()
    return {"count": count, "results": results}

But that's because I personally find it easier to reason about a whole list at the same time, rather than about a loop that gradually accumulates results. Oh — but I suppose my approach founders on the fact that it creates a result object that it doesn't need, which is a waste of effort?

Your email will never ever be published.

Previous:
How to restore all unstaged files in with git February 8, 2024 Linux, GitHub, macOS
Next:
Notes on porting a Next.js v14 app from Pages to App Router March 2, 2024 React, JavaScript
Related by category:
How I run standalone Python in 2025 January 14, 2025 Python
get in JavaScript is the same as property in Python February 13, 2025 Python
How to resolve a git conflict in poetry.lock February 7, 2020 Python
Best practice with retries with requests April 19, 2017 Python
Related by keyword:
Django ORM optimization story on selecting the least possible February 22, 2019 Python, Web development, Django, PostgreSQL
Show size of every PostgreSQL database you have February 7, 2018 PostgreSQL
Weight of your PostgreSQL tables "lumped together" October 31, 2015 PostgreSQL
Fastest "boolean SQL queries" possible with Django January 14, 2011 Django