Here's the code. It's quick-n-dirty but it works wonderfully:
import functools
import hashlib
from django.core.cache import cache
from django.utils.encoding import force_bytes
def lock_decorator(key_maker=None):
"""
When you want to lock a function from more than 1 call at a time.
"""
def decorator(func):
@functools.wraps(func)
def inner(*args, **kwargs):
if key_maker:
key = key_maker(*args, **kwargs)
else:
key = str(args) + str(kwargs)
lock_key = hashlib.md5(force_bytes(key)).hexdigest()
with cache.lock(lock_key):
return func(*args, **kwargs)
return inner
return decorator
How To Use It
This has saved my bacon more than once. I use it on functions that really need to be made synchronous. For example, suppose you have a function like this:
def fetch_remote_thing(name):
try:
return Thing.objects.get(name=name).result
except Thing.DoesNotExist:
# Need to go out and fetch this
result = some_internet_fetching(name) # Assume this is sloooow
Thing.objects.create(name=name, result=result)
return result
That function is quite dangerous because if executed by two concurrent web requests for example, they will trigger
two "identical" calls to some_internet_fetching
and if the database didn't have the name
already, it will most likely trigger two calls to Thing.objects.create(name=name, ...)
which could lead to integrity errors or if it doesn't the whole function breaks down because it assumes that there is only 1 or 0 of these Thing
records.
Easy to solve, just add the lock_decorator
:
@lock_decorator()
def fetch_remote_thing(name):
try:
return Thing.objects.get(name=name).result
except Thing.DoesNotExist:
# Need to go out and fetch this
result = some_internet_fetching(name) # Assume this is sloooow
Thing.objects.create(name=name, result=result)
return result
Now, thanks to Redis distributed locks, the function is always allowed to finish before it starts another one. All the hairy locking (in particular, the waiting) is implemented deep down in Redis which is rock solid.
Bonus Usage
Another use that has also saved my bacon is functions that aren't necessarily called with the same input argument but each call is so resource intensive that you want to make sure it only does one of these at a time. Suppose you have a Django view function that does some resource intensive work and you want to stagger the calls so that it only runs it one at a time. Like this for example:
def api_stats_calculations(request, part):
if part == 'users-per-month':
data = _calculate_users_per_month() # expensive
elif part == 'pageviews-per-week':
data = _calculate_pageviews_per_week() # intensive
elif part == 'downloads-per-day':
data = _calculate_download_per_day() # slow
elif you == 'get' and the == 'idea':
...
return http.JsonResponse({'data': data})
If you just put @lock_decorator()
on this Django view function, and you have some (almost) concurrent calls to this function, for example from a uWSGI
server running with threads and multiple processes, then it will not synchronize the calls.
The solution to this is to write your own function for generating the lock key, like this for example:
@lock_decorator(
key_maker=lamnbda request, part: 'api_stats_calculations'
)
def api_stats_calculations(request, part):
if part == 'users-per-month':
data = _calculate_users_per_month() # expensive
elif part == 'pageviews-per-week':
data = _calculate_pageviews_per_week() # intensive
elif part == 'downloads-per-day':
data = _calculate_download_per_day() # slow
elif you == 'get' and the == 'idea':
...
return http.JsonResponse({'data': data})
Now it works.
How Time-Expensive Is It?
Perhaps you worry that 99% of your calls to the function don't have the problem of calling the function concurrently. How much is this overhead of this lock costing you? I wondered that too and set up a simple stress test where I wrote a really simple Django view function. It looked something like this:
@lock_decorator(key_maker=lambda request: 'samekey')
def sample_view_function(request):
return http.HttpResponse('Ok\n')
I started a Django server with uWSGI
with multiple processors and threads enabled. Then I bombarded this function with a simple concurrent stress test and observed the requests per minute. The cost was extremely tiny and almost negligable (compared to not using the lock decorator). Granted, in this test I used Redis on redis://localhost:6379/0
but generally the conclusion was that the call is extremely fast and not something to worry too much about. But your mileage may vary so do your own experiments for your context.
What's Needed
You need to use django-redis as your Django cache backend. I've blogged before about using django-redis
, for example Fastest cache backend possible for Django and Fastest Redis configuration for Django.
Comments
this is good
awesome... thanks
Good article, just wondering how is it better from locking on db level? Most of RDS engines support it for such concurrent problematic cases.
thank you1