Last week I released django-memoize-function which is a library for Django developers to more conveniently use caching in function calls. This is a quick blog post to demonstrate that with an example.

The verbose traditional way to do it

Suppose you have a view function that takes in a request and returns a HttpResponse. Within, it does some expensive calculation that you know could be cached. Something like this:

No caching


def blog_post(request, slug):
    post = BlogPost.objects.get(slug=slug)

    related_posts = BlogPost.objects.exclude(
        id=post.id
    ).filter(
        # BlogPost.keywords is an ArrayField
        keywords__overlap=post.keywords
    ).order_by('-publish_date')

    context = {
        'post': post,
        'related_posts': related_posts,
    }
    return render(request, 'blogpost.html', context)

So far so good. Perhaps you know that lookup of related posts is slowish and can be cached for at least one hour. So you add this:

Caching


from django.core.cache import cache

def blog_post(request, slug):
    post = BlogPost.objects.get(slug=slug)

    cache_key = b'related_posts:{}'.format(post.id)
    related_posts = cache.get(cache_key)
    if related_posts is None:  # was not cached
        related_posts = BlogPost.objects.exclude(
            id=post.id
        ).filter(
            # BlogPost.keywords is an ArrayField
            keywords__overlap=post.keywords
        ).order_by('-publish_date')
        cache.set(cache_key, related_posts, 60 * 60)

    context = {
        'post': post,
        'related_posts': related_posts,
    }
    return render(request, 'blogpost.html', context)

Great progress. But now you want that cache to immediate reset as soon as the blog posts change.


@login_required
def update_blog_post(request, slug):
    post = BlogPost.objects.get(slug=slug)
    if request.method == 'POST':
        # BlogPostForm is a forms.ModelForm class for BlogPost
        form = BlogPostForm(request.POST, instance=post)
        if form.is_valid():
            form.save()
            cache_key = b'related_posts:{}'.format(post.id)
            cache.delete(cache_key)
            return redirect(reverse('blog_post', args=(post.id,))
    else:
        form = BlogPostForm(instance=post)

    context = {
        'post': post,
        'form': form,
    }
    return render(request, 'edit_blogpost.html', context)

Awesome. Now the cache is cleared as soon as the BlogPost is updated.

Problem; you have repeated the code generating the cache key in two places.

Use django-cache-memoize

First extract out the getting of related posts into its own function and then decorate it.


from cache_memoize import cache_memoize

@cache_memoize(60 * 60)
def get_related_posts(id):
    return BlogPost.objects.exclude(
        id=post.id
    ).filter(
        # BlogPost.keywords is an ArrayField
        keywords__overlap=post.keywords
    ).order_by('-publish_date')


def blog_post(request, slug):
    post = BlogPost.objects.get(slug=slug)

    related_posts = get_related_posts(post.id)

    context = {
        'post': post,
        'related_posts': related_posts,
    }
    return render(request, 'blogpost.html', context) 

Now, to do the cache invalidation you need to call that function get_related_posts one more time:


def update_blog_post(request, slug):
    post = BlogPost.objects.get(slug=slug)
    if request.method == 'POST':
        # BlogPostForm is a forms.ModelForm class for BlogPost
        form = BlogPostForm(request.POST, instance=post)
        if form.is_valid():
            form.save()

            # NOTE!
            get_related_posts.invalidate(post.id)

            return redirect(reverse('blog_post', args=(post.id,))
    else:
        form = BlogPostForm(instance=post)

    context = {
        'post': post,
        'form': form,
    }
    return render(request, 'edit_blogpost.html', context)

Now you're not repeating the code that constructs the cache key.

Getting fancy; hot cache

The above pattern, with or without django-cache-memoize, clears the cache when the blog post changes and then you basically wait till the next time the blog post is rendered, then the cache will be populated again.

A more "aggressive" pattern is to "heat the cache up" right after we've cleared it. A simple change is to call get_related_posts() again and let it cache. But to make sure it gets a fresh set of results we pass in the extra _refresh=True argument.


def update_blog_post(request, slug):
    post = BlogPost.objects.get(slug=slug)
    if request.method == 'POST':
        # BlogPostForm is a forms.ModelForm class for BlogPost
        form = BlogPostForm(request.POST, instance=post)
        if form.is_valid():
            form.save()

            # NOTE!
            # Refresh the cache here and now
            get_related_posts(post.id, _refresh=True)

            return redirect(reverse('blog_post', args=(post.id,))
    else:
        form = BlogPostForm(instance=post)

    context = {
        'post': post,
        'form': form,
    }
    return render(request, 'edit_blogpost.html', context)

What was the point of that?

The above example doesn't do a great job demonstrating how convenient it can be to use django-cache-memoize compared to "doing it manually". If your code base is peppered with lots of little blocks where you construct a cache key, check the cache, fall back on re-generation and write to cache again; then it can really add up to take away all of that mess and just use a decorator on anything that can be memoized.

Probably the biggest benefit with moving the cacheable functionality into its own function and decorating it is that all that hassle code with creating safe and unique cache keys is all in one place. You won't be violating the Don't Repeat Yourself principle. This becomes especially important once the cache keys that need to be constructed are getting complex and needs care.

Ultimately if you're able, your code will be free of various cache.set and cache.get code and yet a bunch of cacheable stuff gets cached nicely.

Why not use a regular @memoize or @functools.lru_cache?

The major difference between something like https://pypi.python.org/pypi/memoize/ and django-cache-memoize is that django-cache-memoize uses django.core.cache.cache which is a global state store (most likely backed by Redis or Memcached). If you use one of the other memoization solutions they'll be in-memory. Meaning, if your production code runs Gunicorn or uWSGI with, say, 8 workers then you'll have 8 copies of the same cache store. So if you're trying to protect an expensive function with @functools.lru_cache it will, worst case, be a cache miss 8 times on 8 different requests.

Comments

Your email will never ever be published.

Related posts