Last week I released django-memoize-function
which is a library for Django developers to more conveniently use caching in function calls. This is a quick blog post to demonstrate that with an example.
The verbose traditional way to do it
Suppose you have a view function that takes in a request
and returns a HttpResponse
. Within, it does some expensive calculation that you know could be cached. Something like this:
No caching
def blog_post(request, slug):
post = BlogPost.objects.get(slug=slug)
related_posts = BlogPost.objects.exclude(
id=post.id
).filter(
# BlogPost.keywords is an ArrayField
keywords__overlap=post.keywords
).order_by('-publish_date')
context = {
'post': post,
'related_posts': related_posts,
}
return render(request, 'blogpost.html', context)
So far so good. Perhaps you know that lookup of related posts is slowish and can be cached for at least one hour. So you add this:
Caching
from django.core.cache import cache
def blog_post(request, slug):
post = BlogPost.objects.get(slug=slug)
cache_key = b'related_posts:{}'.format(post.id)
related_posts = cache.get(cache_key)
if related_posts is None: # was not cached
related_posts = BlogPost.objects.exclude(
id=post.id
).filter(
# BlogPost.keywords is an ArrayField
keywords__overlap=post.keywords
).order_by('-publish_date')
cache.set(cache_key, related_posts, 60 * 60)
context = {
'post': post,
'related_posts': related_posts,
}
return render(request, 'blogpost.html', context)
Great progress. But now you want that cache to immediate reset as soon as the blog posts change.
@login_required
def update_blog_post(request, slug):
post = BlogPost.objects.get(slug=slug)
if request.method == 'POST':
# BlogPostForm is a forms.ModelForm class for BlogPost
form = BlogPostForm(request.POST, instance=post)
if form.is_valid():
form.save()
cache_key = b'related_posts:{}'.format(post.id)
cache.delete(cache_key)
return redirect(reverse('blog_post', args=(post.id,))
else:
form = BlogPostForm(instance=post)
context = {
'post': post,
'form': form,
}
return render(request, 'edit_blogpost.html', context)
Awesome. Now the cache is cleared as soon as the BlogPost
is updated.
Problem; you have repeated the code generating the cache key in two places.
Use django-cache-memoize
First extract out the getting of related posts into its own function and then decorate it.
from cache_memoize import cache_memoize
@cache_memoize(60 * 60)
def get_related_posts(id):
return BlogPost.objects.exclude(
id=post.id
).filter(
# BlogPost.keywords is an ArrayField
keywords__overlap=post.keywords
).order_by('-publish_date')
def blog_post(request, slug):
post = BlogPost.objects.get(slug=slug)
related_posts = get_related_posts(post.id)
context = {
'post': post,
'related_posts': related_posts,
}
return render(request, 'blogpost.html', context)
Now, to do the cache invalidation you need to call that function get_related_posts
one more time:
def update_blog_post(request, slug):
post = BlogPost.objects.get(slug=slug)
if request.method == 'POST':
# BlogPostForm is a forms.ModelForm class for BlogPost
form = BlogPostForm(request.POST, instance=post)
if form.is_valid():
form.save()
# NOTE!
get_related_posts.invalidate(post.id)
return redirect(reverse('blog_post', args=(post.id,))
else:
form = BlogPostForm(instance=post)
context = {
'post': post,
'form': form,
}
return render(request, 'edit_blogpost.html', context)
Now you're not repeating the code that constructs the cache key.
Getting fancy; hot cache
The above pattern, with or without django-cache-memoize
, clears the cache when the blog post changes and then you basically wait till the next time the blog post is rendered, then the cache will be populated again.
A more "aggressive" pattern is to "heat the cache up" right after we've cleared it. A simple change is to call get_related_posts()
again and let it cache. But to make sure it gets a fresh set of results we pass in the extra _refresh=True
argument.
def update_blog_post(request, slug):
post = BlogPost.objects.get(slug=slug)
if request.method == 'POST':
# BlogPostForm is a forms.ModelForm class for BlogPost
form = BlogPostForm(request.POST, instance=post)
if form.is_valid():
form.save()
# NOTE!
# Refresh the cache here and now
get_related_posts(post.id, _refresh=True)
return redirect(reverse('blog_post', args=(post.id,))
else:
form = BlogPostForm(instance=post)
context = {
'post': post,
'form': form,
}
return render(request, 'edit_blogpost.html', context)
What was the point of that?
The above example doesn't do a great job demonstrating how convenient it can be to use django-cache-memoize
compared to "doing it manually". If your code base is peppered with lots of little blocks where you construct a cache key, check the cache, fall back on re-generation and write to cache again; then it can really add up to take away all of that mess and just use a decorator on anything that can be memoized.
Probably the biggest benefit with moving the cacheable functionality into its own function and decorating it is that all that hassle code with creating safe and unique cache keys is all in one place. You won't be violating the Don't Repeat Yourself principle. This becomes especially important once the cache keys that need to be constructed are getting complex and needs care.
Ultimately if you're able, your code will be free of various cache.set
and cache.get
code and yet a bunch of cacheable stuff gets cached nicely.
Why not use a regular @memoize
or @functools.lru_cache
?
The major difference between something like https://pypi.python.org/pypi/memoize/
and django-cache-memoize
is that django-cache-memoize
uses django.core.cache.cache
which is a global state store (most likely backed by Redis or Memcached). If you use one of the other memoization solutions they'll be in-memory. Meaning, if your production code runs Gunicorn or uWSGI with, say, 8 workers then you'll have 8 copies of the same cache store. So if you're trying to protect an expensive function with @functools.lru_cache
it will, worst case, be a cache miss 8 times on 8 different requests.
Comments