Django ORM optimization story on selecting the least possible
February 22, 2019
16 comments Web development, Django, Python, PostgreSQL
This an optimization story that should not surprise anyone using the Django ORM. But I thought I'd share because I have numbers now! The origin of this came from a real requirement. For a given parent model, I'd like to extract the value of the name
column of all its child models, and the turn all these name
strings into 1 MD5 checksum string.
Variants
The first attempted looked like this:
artist = Artist.objects.get(name="Bad Religion")
names = []
for song in Song.objects.filter(artist=artist):
names.append(song.name)
return hashlib.md5("".join(names).encode("utf-8")).hexdigest()
The SQL used to generate this is as follows:
SELECT "main_song"."id", "main_song"."artist_id", "main_song"."name",
"main_song"."text", "main_song"."language", "main_song"."key_phrases",
"main_song"."popularity", "main_song"."text_length", "main_song"."metadata",
"main_song"."created", "main_song"."modified",
"main_song"."has_lastfm_listeners", "main_song"."has_spotify_popularity"
FROM "main_song" WHERE "main_song"."artist_id" = 22729;
Clearly, I don't need anything but just the name
column, version 2:
artist = Artist.objects.get(name="Bad Religion")
names = []
for song in Song.objects.filter(artist=artist).only("name"):
names.append(song.name)
return hashlib.md5("".join(names).encode("utf-8")).hexdigest()
Now, the SQL used is:
SELECT "main_song"."id", "main_song"."name"
FROM "main_song" WHERE "main_song"."artist_id" = 22729;
But still, since I don't really need instances of model class Song
I can use the .values()
method which gives back a list of dictionaries. This is version 3:
names = []
for song in Song.objects.filter(artist=a).values("name"):
names.append(song["name"])
return hashlib.md5("".join(names).encode("utf-8")).hexdigest()
This time Django figures it doesn't even need the primary key value so it looks like this:
SELECT "main_song"."name" FROM "main_song" WHERE "main_song"."artist_id" = 22729;
Last but not least; there is an even faster one. values_list()
. This time it doesn't even bother to map the column name to the value in a dictionary. And since I only need 1 column's value, I can set flat=True
. Version 4 looks like this:
names = []
for name in Song.objects.filter(artist=a).values_list("name", flat=True):
names.append(name)
return hashlib.md5("".join(names).encode("utf-8")).hexdigest()
Same SQL gets used this time as in version 3.
The benchmark
Hopefully this little benchmark script speaks for itself:
from songsearch.main.models import *
import hashlib
def f1(a):
names = []
for song in Song.objects.filter(artist=a):
names.append(song.name)
return hashlib.md5("".join(names).encode("utf-8")).hexdigest()
def f2(a):
names = []
for song in Song.objects.filter(artist=a).only("name"):
names.append(song.name)
return hashlib.md5("".join(names).encode("utf-8")).hexdigest()
def f3(a):
names = []
for song in Song.objects.filter(artist=a).values("name"):
names.append(song["name"])
return hashlib.md5("".join(names).encode("utf-8")).hexdigest()
def f4(a):
names = []
for name in Song.objects.filter(artist=a).values_list("name", flat=True):
names.append(name)
return hashlib.md5("".join(names).encode("utf-8")).hexdigest()
artist = Artist.objects.get(name="Bad Religion")
print(Song.objects.filter(artist=artist).count())
print(f1(artist) == f2(artist))
print(f2(artist) == f3(artist))
print(f3(artist) == f4(artist))
# Reporting
import time
import random
import statistics
functions = f1, f2, f3, f4
times = {f.__name__: [] for f in functions}
for i in range(500):
func = random.choice(functions)
t0 = time.time()
func(artist)
t1 = time.time()
times[func.__name__].append((t1 - t0) * 1000)
for name in sorted(times):
numbers = times[name]
print("FUNCTION:", name, "Used", len(numbers), "times")
print("\tBEST", min(numbers))
print("\tMEDIAN", statistics.median(numbers))
print("\tMEAN ", statistics.mean(numbers))
print("\tSTDEV ", statistics.stdev(numbers))
I ran this on my PostgreSQL 11.1 on my MacBook Pro with Django 2.1.7. So the database is on localhost
.
The results
276 True True True FUNCTION: f1 Used 135 times BEST 6.309986114501953 MEDIAN 7.531881332397461 MEAN 7.834429211086697 STDEV 2.03779968066591 FUNCTION: f2 Used 135 times BEST 3.039121627807617 MEDIAN 3.7298202514648438 MEAN 4.012803678159361 STDEV 1.8498943539073027 FUNCTION: f3 Used 110 times BEST 0.9920597076416016 MEDIAN 1.4405250549316406 MEAN 1.5053835782137783 STDEV 0.3523240470133114 FUNCTION: f4 Used 120 times BEST 0.9369850158691406 MEDIAN 1.3251304626464844 MEAN 1.4017681280771892 STDEV 0.3391019435930447
Discussion
I guess the hashlib.md5("".join(names).encode("utf-8")).hexdigest()
stuff is a bit "off-topic" but I checked and it's roughly 300 times faster than building up the names
list.
It's clearly better to ask less of Python and PostgreSQL to get a better total time. No surprise there. What was interesting was the proportion of these differences. Memorize that and you'll be better equipped if it's worth the hassle of not using the Django ORM in the most basic form.
Also, do take note that this is only relevant in when dealing with many records. The slowest variant (f1
) takes, on average, 7 milliseconds.
Summarizing the difference with percentages compared to the fastest variant:
f1
- 573% slowerf2
- 225% slowerf3
- 6% slowerf4
- 0% slower
UPDATE Feb 25 2019
James suggested, although a bit "missing the point", that it could be even faster if all the aggregation is pushed into the PostgreSQL server and then the only thing that needs to transfer from PostgreSQL to Python is the final result.
By the way, name
column in this particular benchmark, when concatenated into one big string, is ~4KB. So, with variant f5
it only needs to transfer 32 bytes which will/would make a bigger difference if the network latency is higher.
Here's the whole script: https://gist.github.com/peterbe/b2b7ed95d422ab25a65639cb8412e75e
And the results:
276 True True True False False FUNCTION: f1 Used 92 times BEST 5.928993225097656 MEDIAN 7.311463356018066 MEAN 7.594626882801885 STDEV 2.2027017044658423 FUNCTION: f2 Used 75 times BEST 2.878904342651367 MEDIAN 3.3979415893554688 MEAN 3.4774907430013022 STDEV 0.5120246550765524 FUNCTION: f3 Used 88 times BEST 0.9310245513916016 MEDIAN 1.1944770812988281 MEAN 1.3105544176968662 STDEV 0.35922655625999383 FUNCTION: f4 Used 71 times BEST 0.7879734039306641 MEDIAN 1.1661052703857422 MEAN 1.2262606284987758 STDEV 0.3561764250427344 FUNCTION: f5 Used 90 times BEST 0.7929801940917969 MEDIAN 1.0334253311157227 MEAN 1.1836051940917969 STDEV 0.4001442703048186 FUNCTION: f6 Used 84 times BEST 0.80108642578125 MEDIAN 1.1119842529296875 MEAN 1.2281338373819988 STDEV 0.37146893005516973
Result: f5
is takes 0.793ms and (the previous "winner") f4
takes 0.788ms.
I'm not entirely sure why f5
isn't faster but I suspect it's because the dataset is too small for it all to matter.
Compare:
songsearch=# explain analyze SELECT "main_song"."name" FROM "main_song" WHERE "main_song"."artist_id" = 22729; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------ Index Scan using main_song_ca949605 on main_song (cost=0.43..229.33 rows=56 width=16) (actual time=0.014..0.208 rows=276 loops=1) Index Cond: (artist_id = 22729) Planning Time: 0.113 ms Execution Time: 0.242 ms (4 rows)
with...
songsearch=# explain analyze SELECT md5(STRING_AGG("main_song"."name", '')) AS "names_hash" FROM "main_song" WHERE "main_song"."artist_id" = 22729; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------ Aggregate (cost=229.47..229.48 rows=1 width=32) (actual time=0.278..0.278 rows=1 loops=1) -> Index Scan using main_song_ca949605 on main_song (cost=0.43..229.33 rows=56 width=16) (actual time=0.019..0.204 rows=276 loops=1) Index Cond: (artist_id = 22729) Planning Time: 0.115 ms Execution Time: 0.315 ms (5 rows)
I ran these two SQL statements about 100 times each and recorded their best possible execution times:
1) The plain SELECT
- 0.99ms
2) The STRING_AGG
- 1.06ms
So that accounts from ~0.1ms difference only! Which kinda matches the results seen above. All in all, I think the dataset is too small to demonstrate this technique. But, considering the chance that the complexity might not be linear with the performance benefit, it's still interesting.
Even though this tangent is a big off-topic, it is often a great idea to push as much work into the database as you can if applicable. Especially if it means you can transfer a lot less data eventually.
variable_cache_control - Django view decorator to set max_age in runtime
January 22, 2019
0 comments Django, Python
tl;dr; If you use the django.views.decorators.cache.cache_control
decorator, consider this one instead to change the max_age
depending on the request.
I had/have a Django view function that looks something like this:
@cache_control(public=True, max_age=60 * 60)
def home(request, oc=None, page=1):
...
But, that number 60 * 60
I really needed it to be different depending on the request parameters. For example, that oc=None
, if that's not None
I know the page's Cache-Control
header can and should be different.
So I wrote this decorator:
from django.utils.cache import patch_cache_control
def variable_cache_control(**kwargs):
"""Same as django.views.decorators.cache.cache_control except this one will
allow the `max_age` parameter be a callable.
"""
def _cache_controller(viewfunc):
@functools.wraps(viewfunc)
def _cache_controlled(request, *args, **kw):
response = viewfunc(request, *args, **kw)
copied = kwargs
if kwargs.get("max_age") and callable(kwargs["max_age"]):
max_age = kwargs["max_age"](request, *args, **kw)
# Can't re-use, have to create a shallow clone.
copied = dict(kwargs, max_age=max_age)
patch_cache_control(response, **copied)
return response
return _cache_controlled
return _cache_controller
Now, I can do this instead:
def _best_max_age(req, oc=None, **kwargs):
max_age = 60 * 60
if oc:
max_age *= 10
return max_age
@variable_cache_control(public=True, max_age=_best_max_age)
def home(request, oc=None, page=1):
...
I hope it inspires.
django-pipeline and Zopfli
August 15, 2018
0 comments Python, Web development, Django
tl;dr; I wrote my own extension to django-pipeline
that uses Zopfli to create .gz
files from static assets collected in Django. Here's the code.
Nginx and Gzip
What I wanted was to continue to use django-pipeline
which does a great job of reading a settings.BUNDLES
setting and generating things like /static/js/myapp.min.a206ec6bd8c7.js
. It has configurable options to not just make those files but also generate /static/js/myapp.min.a206ec6bd8c7.js.gz
which means that with gzip_static
in Nginx, Nginx doesn't have to Gzip compress static files on-the-fly but can basically just read it from disk. Nginx doesn't care how the file got there but an immediate advantage of preparing the file on disk is that the compression can be higher (smaller .gz
files). That means smaller responses to be sent to the client and less CPU work needed from Nginx. Your job is to set gzip_static on;
in your Nginx config (per location
) and make sure every compressable file exists on disk with the same name but with the .gz
suffix.
In other words, when the client does GET https://example.com/static/foo.js
Nginx quickly does a read on the file system to see if there exists a ROOT/static/foo.js.gz
and if so, return that. If the files doesn't exist, and you have gzip on;
in your config, Nginx will read the ROOT/static/foo.js
into memory, compress it (usually with a lower compression level) and return that. Nginx takes care of figuring out whether to do this, at all, dynamically by reading the Accept-Encoding
header from the request.
Zopfli
The best solution today to generate these .gz
files is Zopfli. Zopfli is slower than good old regular gzip
but the files get smaller. To manually compress a file you can install the zopfli
executable (e.g. brew install zopfli
or apt install zopfli
) and then run zopfli $ROOT/static/foo.js
which creates a $ROOT/static/foo.js.gz
file.
So your task is to build some pipelining code that generates .gz
version of every static file your Django server creates.
At first I tried django-static-compress
which has an extension to regular Django staticfiles storage. The default staticfiles storage is django.contrib.staticfiles.storage.StaticFilesStorage
and that's what django-static-compress
extends.
But I wanted more. I wanted all the good bits from django-pipeline
(minification, hashes in filenames, concatenation, etc.) Also, in django-static-compress
you can't control the parameters to zopfli
such as the number of iterations. And with django-static-compress
you have to install Brotli
which I can't use because I don't want to compile my own Nginx.
Solution
So I wrote my own little mashup. I took some ideas from how django-pipeline
does regular gzip
compression as a post-process step. And in my case, I never want to bother with any of the other files that are put into the settings.STATIC_ROOT
directory from the collectstatic
command.
Here's my implementation: peterbecom.storage.ZopfliPipelineCachedStorage. Check it out. It's very tailored to my personal preferences and usecase but it works great. To use it, I have this in my settings.py
: STATICFILES_STORAGE = "peterbecom.storage.ZopfliPipelineCachedStorage"
I know what you're thinking
Why not try to get this into django-pipeline
or into django-compress-static
. The answer is frankly laziness. Hopefully someone else can pick up this task. I have fewer and fewer projects where I use Django to handle static files. These days most of my projects are single-page-apps that are 100% static and using Django for XHR requests to get the data.
Django lock decorator with django-redis
August 14, 2018
4 comments Python, Web development, Django, Redis
Here's the code. It's quick-n-dirty but it works wonderfully:
import functools
import hashlib
from django.core.cache import cache
from django.utils.encoding import force_bytes
def lock_decorator(key_maker=None):
"""
When you want to lock a function from more than 1 call at a time.
"""
def decorator(func):
@functools.wraps(func)
def inner(*args, **kwargs):
if key_maker:
key = key_maker(*args, **kwargs)
else:
key = str(args) + str(kwargs)
lock_key = hashlib.md5(force_bytes(key)).hexdigest()
with cache.lock(lock_key):
return func(*args, **kwargs)
return inner
return decorator
How To Use It
This has saved my bacon more than once. I use it on functions that really need to be made synchronous. For example, suppose you have a function like this:
def fetch_remote_thing(name):
try:
return Thing.objects.get(name=name).result
except Thing.DoesNotExist:
# Need to go out and fetch this
result = some_internet_fetching(name) # Assume this is sloooow
Thing.objects.create(name=name, result=result)
return result
That function is quite dangerous because if executed by two concurrent web requests for example, they will trigger
two "identical" calls to some_internet_fetching
and if the database didn't have the name
already, it will most likely trigger two calls to Thing.objects.create(name=name, ...)
which could lead to integrity errors or if it doesn't the whole function breaks down because it assumes that there is only 1 or 0 of these Thing
records.
Easy to solve, just add the lock_decorator
:
@lock_decorator()
def fetch_remote_thing(name):
try:
return Thing.objects.get(name=name).result
except Thing.DoesNotExist:
# Need to go out and fetch this
result = some_internet_fetching(name) # Assume this is sloooow
Thing.objects.create(name=name, result=result)
return result
Now, thanks to Redis distributed locks, the function is always allowed to finish before it starts another one. All the hairy locking (in particular, the waiting) is implemented deep down in Redis which is rock solid.
Bonus Usage
Another use that has also saved my bacon is functions that aren't necessarily called with the same input argument but each call is so resource intensive that you want to make sure it only does one of these at a time. Suppose you have a Django view function that does some resource intensive work and you want to stagger the calls so that it only runs it one at a time. Like this for example:
def api_stats_calculations(request, part):
if part == 'users-per-month':
data = _calculate_users_per_month() # expensive
elif part == 'pageviews-per-week':
data = _calculate_pageviews_per_week() # intensive
elif part == 'downloads-per-day':
data = _calculate_download_per_day() # slow
elif you == 'get' and the == 'idea':
...
return http.JsonResponse({'data': data})
If you just put @lock_decorator()
on this Django view function, and you have some (almost) concurrent calls to this function, for example from a uWSGI
server running with threads and multiple processes, then it will not synchronize the calls.
The solution to this is to write your own function for generating the lock key, like this for example:
@lock_decorator(
key_maker=lamnbda request, part: 'api_stats_calculations'
)
def api_stats_calculations(request, part):
if part == 'users-per-month':
data = _calculate_users_per_month() # expensive
elif part == 'pageviews-per-week':
data = _calculate_pageviews_per_week() # intensive
elif part == 'downloads-per-day':
data = _calculate_download_per_day() # slow
elif you == 'get' and the == 'idea':
...
return http.JsonResponse({'data': data})
Now it works.
How Time-Expensive Is It?
Perhaps you worry that 99% of your calls to the function don't have the problem of calling the function concurrently. How much is this overhead of this lock costing you? I wondered that too and set up a simple stress test where I wrote a really simple Django view function. It looked something like this:
@lock_decorator(key_maker=lambda request: 'samekey')
def sample_view_function(request):
return http.HttpResponse('Ok\n')
I started a Django server with uWSGI
with multiple processors and threads enabled. Then I bombarded this function with a simple concurrent stress test and observed the requests per minute. The cost was extremely tiny and almost negligable (compared to not using the lock decorator). Granted, in this test I used Redis on redis://localhost:6379/0
but generally the conclusion was that the call is extremely fast and not something to worry too much about. But your mileage may vary so do your own experiments for your context.
What's Needed
You need to use django-redis as your Django cache backend. I've blogged before about using django-redis
, for example Fastest cache backend possible for Django and Fastest Redis configuration for Django.
django-html-validator now supports Django 2.x
August 13, 2018
0 comments Python, Web development, Django
django-html-validator is a Django project that can validate your generated HTML. It does so by sending the HTML to https://html5.validator.nu/ or you can start your own Java server locally with vnu.jar
from here.
The output is that you can have validation errors printed to stdout
or you can have them put as .txt
files in a temporary directory. You can also include it in your test suite and make it so that tests fail if invalid HTML is generated during rendering in Django unit tests.
The project seems to have become a lot more popular than I thought it would. It started as a one-evening-hack and because there was interest I wrapped it up in a proper project with "docs" and set up CI for future contributions.
I kinda of forgot the project since almost all my current projects generate JSON on the server and generates the DOM on-the-fly with client-side JavaScript but apparently a lot of issues and PRs were filed related to making it work in Django 2.x. So I took the time last night to tidy up the tox.ini
etc. and the necessary compatibility fixes to make it work with but Django 1.8 up to Django 2.1. Pull request here.
Thank you all who contributed! I'll try to make a better job noticing filed issues in the future.
A good Django view function cache decorator for http.JsonResponse
June 20, 2018
0 comments Python, Web development, Django
I use this a lot. It has served me very well. The code:
import hashlib
import functools
import markus # optional
from django.core.cache import cache
from django import http
from django.utils.encoding import force_bytes, iri_to_uri
metrics = markus.get_metrics(__name__) # optional
def json_response_cache_page_decorator(seconds):
"""Cache only when there's a healthy http.JsonResponse response."""
def decorator(func):
@functools.wraps(func)
def inner(request, *args, **kwargs):
cache_key = 'json_response_cache:{}:{}'.format(
func.__name__,
hashlib.md5(force_bytes(iri_to_uri(
request.build_absolute_uri()
))).hexdigest()
)
content = cache.get(cache_key)
if content is not None:
# metrics is optional
metrics.incr(
'json_response_cache_hit',
tags=['view:{}'.format(func.__name__)]
)
return http.HttpResponse(
content,
content_type='application/json'
)
response = func(request, *args, **kwargs)
if (
isinstance(response, http.JsonResponse) and
response.status_code in (200, 304)
):
cache.set(cache_key, response.content, seconds)
return response
return inner
return decorator
To use it simply add to Django view functions that might return a http.JsonResponse
. For example, something like this:
@json_response_cache_page_decorator(60)
def search(request):
q = request.GET.get('q')
if not q:
return http.HttpResponseBadRequest('no q')
results = search_database(q)
return http.JsonResponse({
'results': results,
})
The reasons I use this instead of django.views.decorators.cache.cache_page()
is because of a couple of reasons.
cache_page
generates cache keys that don't contain the view function name.cache_page
tries to cache the wholehttp.HttpResponse
instance which can't be serialized if you use themsgpack
serializer.cache_page
also sendsCache-Control
headers which is not always what you want.- Not possible to inject your own custom code such as my usage of
metrics
.
Disclaimer: This snippet of code comes from a side-project that has a very specific set of requirements. They're rather unique to that project and I have a full picture of the needs. E.g. I know what specific headers matter and don't matter. Your project might be different. For example, perhaps you don't have markus
to handle your metrics. Or perhaps you need to re-write the query string for something to normalize the cache key differently. Point being, take the snippet of code as inspiration when you too find that django.views.decorators.cache.cache_page()
isn't good enough for your Django view functions.