tl;dr; Redis is twice as fast as memcached as a Django cache backend when installed using AWS ElastiCache. Only tested for reads.
Django has a wonderful caching framework. I think I say "wonderful" because it's so simple. Not because it has a hundred different bells or whistles. Each cache gets a name (e.g. "mymemcache" or "redis append only"). The only configuration you generally have to worry about is 1) what backed and 2) what location.
For example, to set up a memcached backend:
# this in settings.py
CACHES = {
'default': {
'BACKEND': 'django.core.cache.backends.memcached.MemcachedCache',
'KEY_PREFIX': 'myapp',
'LOCATION': config('MEMCACHED_LOCATION', '127.0.0.1:11211'),
},
}
With that in play you can now do:
>>> from django.core.cache import caches
>>> caches['default'].set('key', 'value', 60) # 60 seconds
>>> caches['default'].get('key')
'value'
Django comes without built-in backend called django.core.cache.backends.locmem.LocMemCache
which is basically a simply Python object in memory with no persistency between Python processes. This one is of course super fast because it involves no further network (local or remote) beyond the process itself. But it's not really useful because if you care about performance (which you probably are if you're here because of the blog post title) because it can't be reused amongst processes.
Anyway, the most common backends to use are:
- Memcached
- Redis
These are semi-persistent and built for extremely fast key lookups. They can both be reached over TCP or via a socket.
What I wanted to see, is which one is fastest.
The Experiment
First of all, in this blog post I'm only measuring the read times of the various cache backends.
Here's the Django view function that is the experiment:
from django.conf import settings
from django.core.cache import caches
def run(request, cache_name):
if cache_name == 'random':
cache_name = random.choice(settings.CACHE_NAMES)
cache = caches[cache_name]
t0 = time.time()
data = cache.get('benchmarking', [])
t1 = time.time()
if random.random() < settings.WRITE_CHANCE:
data.append(t1 - t0)
cache.set('benchmarking', data, 60)
if data:
avg = 1000 * sum(data) / len(data)
else:
avg = 'notyet'
# print(cache_name, '#', len(data), 'avg:', avg, ' size:', len(str(data)))
return http.HttpResponse('{}\n'.format(avg))
It records the time to make a cache.get
read and depending settings.WRITE_CHANCE
it also does a write (but doesn't record that).
What it records is a list of floats. The content of that piece of data stored in the cache looks something like this:
[0.0007331371307373047]
[0.0007331371307373047, 0.0002570152282714844]
[0.0007331371307373047, 0.0002570152282714844, 0.0002200603485107422]
So the data grows from being really small to something really large. If you run this 1,000 times with settings.WRITE_CACHE
of 1.0
the last time it has to fetch a list of 999 floats out of the cache backend.
You can either test it with 1 specific backend in mind and see how fast Django can do, say, 10,000 of these. Here's one such example:
$ wrk -t10 -c400 -d10s http://127.0.0.1:8000/default Running 10s test @ http://127.0.0.1:8000/default 10 threads and 400 connections Thread Stats Avg Stdev Max +/- Stdev Latency 76.28ms 155.26ms 1.41s 92.70% Req/Sec 349.92 193.36 1.51k 79.30% 34107 requests in 10.10s, 2.56MB read Socket errors: connect 0, read 0, write 0, timeout 59 Requests/sec: 3378.26 Transfer/sec: 259.78KB $ wrk -t10 -c400 -d10s http://127.0.0.1:8000/memcached Running 10s test @ http://127.0.0.1:8000/memcached 10 threads and 400 connections Thread Stats Avg Stdev Max +/- Stdev Latency 96.87ms 183.16ms 1.81s 95.10% Req/Sec 213.42 82.47 0.91k 76.08% 21315 requests in 10.09s, 1.57MB read Socket errors: connect 0, read 0, write 0, timeout 32 Requests/sec: 2111.68 Transfer/sec: 159.27KB $ wrk -t10 -c400 -d10s http://127.0.0.1:8000/redis Running 10s test @ http://127.0.0.1:8000/redis 10 threads and 400 connections Thread Stats Avg Stdev Max +/- Stdev Latency 84.93ms 148.62ms 1.66s 92.20% Req/Sec 262.96 138.72 1.10k 81.20% 25271 requests in 10.09s, 1.87MB read Socket errors: connect 0, read 0, write 0, timeout 15 Requests/sec: 2503.55 Transfer/sec: 189.96KB
But an immediate disadvantage with this is that the "total final rate" (i.e. requests/sec) is likely to include so many other factors. However, you can see that LocMemcache
got 3378.26 req/s, MemcachedCache
got 2111.68 req/s and RedisCache
got 2503.55 req/s.
The code for the experiment is available here: https://github.com/peterbe/django-fastest-cache
The Infra Setup
I created an AWS m3.xlarge
EC2 Ubuntu node and two nodes in AWS ElastiCache. One 2-node memcached cluster based on cache.m3.xlarge
and one 2-node 1-replica Redis cluster also based on cache.m3.xlarge
.
The Django server was run with uWSGI like this:
uwsgi --http :8000 --wsgi-file fastestcache/wsgi.py --master --processes 6 --threads 10
The Results
Instead of hitting one backend repeatedly and reporting the "requests per second" I hit the "random" endpoint for 30 seconds and let it randomly select a cache backend each time and once that's done, I'll read each cache and look at the final massive list of timings it took to make all the reads. I run it like this:
wrk -t10 -c400 -d30s http://127.0.0.1:8000/random && curl http://127.0.0.1:8000/summary ...wrk output redacted... TIMES AVERAGE MEDIAN STDDEV memcached 5738 7.523ms 4.828ms 8.195ms default 3362 0.305ms 0.187ms 1.204ms redis 4958 3.502ms 1.707ms 5.591ms Best Averages (shorter better) ############################################################################### █████████████████████████████████████████████████████████████ 7.523 memcached ██ 0.305 default ████████████████████████████ 3.502 redis
Things to note:
- Redis is twice as fast as memcached.
- Pure Python
LocMemcache
is 10 times faster than Redis. - The table reports average and median. The ASCII bar chart shows only the averages.
- All three backends report huge standard deviations. The median is very different from the average.
- The average is probably the more interesting number since it more reflects the ups and downs of reality.
- If you compare the medians, Redis is 3 times faster than memcached.
- It's luck that Redis got fewer datapoints than memcached (4958 vs 5738) but it's as expected that the
LocMemcache
backend only gets 3362 because the uWSGI server that is used is spread across multiple processes.
Other Things To Test
Perhaps pylibmc is faster than python-memcached
.
TIMES AVERAGE MEDIAN STDDEV pylibmc 2893 8.803ms 6.080ms 7.844ms default 3456 0.315ms 0.181ms 1.656ms redis 4754 3.697ms 1.786ms 5.784ms Best Averages (shorter better) ############################################################################### ██████████████████████████████████████████████████████████████ 8.803 pylibmc ██ 0.315 default ██████████████████████████ 3.697 redis
Using pylibmc
didn't make things much faster. What if we we pit memcached
against pylibmc
?:
TIMES AVERAGE MEDIAN STDDEV pylibmc 3005 8.653ms 5.734ms 8.339ms memcached 2868 8.465ms 5.367ms 9.065ms Best Averages (shorter better) ############################################################################### █████████████████████████████████████████████████████████████ 8.653 pylibmc ███████████████████████████████████████████████████████████ 8.465 memcached
What about that fancy hiredis Redis Python driver that's supposedly faster?
TIMES AVERAGE MEDIAN STDDEV redis 4074 5.628ms 2.262ms 8.300ms hiredis 4057 5.566ms 2.296ms 8.471ms Best Averages (shorter better) ############################################################################### ███████████████████████████████████████████████████████████████ 5.628 redis ██████████████████████████████████████████████████████████████ 5.566 hiredis
These last two results are both surprising and suspicious. Perhaps the whole setup is wrong. Why wouldn't the C-based libraries be faster? Is it so incredibly dwarfed by the network I/O in the time between my EC2 node and the ElastiCache nodes?
In Conclusion
I personally like Redis. It's not as stable as memcached. On a personal server I've run for years the Redis server sometimes just dies due to corrupt memory and I've come to accept that. I don't think I've ever seen memcache do that.
But there are other benefits with Redis as a cache backend. With the django-redis
library you have really easy access to the raw Redis connection and you can do much more advanced data structures. You can also cache certain things indefinitely. Redis also supports storing much larger strings than memcached (1MB for memcached and 512MB for Redis).
The conclusion is that Redis is faster than memcached by a factor of 2. Considering the other feature benefits you can get out of having a Redis server available, it's probably a good choice for your next Django project.
Bonus Feature
In big setups you most likely have a whole slur of web heads that are servers that do nothing but handle web requests. And these are configured to talk to databases and caches over the near network. However, many of us have cheap servers on DigitalOcean or Linode where we run web servers, relational databases and cache servers all on the same machine. (I do. This blog is one of those where there is Nginx, Redis, memcached and PostgreSQL on a 4GB DigitalOcean SSD Ubuntu).
So here's one last test where I installed a local Redis and a local memcached on the EC2 node itself:
$ cat .env | grep 127.0.0.1 MEMCACHED_LOCATION="127.0.0.1:11211" REDIS_LOCATION="redis://127.0.0.1:6379/0"
Here are the results:
TIMES AVERAGE MEDIAN STDDEV memcached 7366 3.456ms 1.380ms 5.678ms default 3716 0.263ms 0.189ms 1.002ms redis 5582 2.334ms 0.639ms 4.965ms Best Averages (shorter better) ############################################################################### █████████████████████████████████████████████████████████████ 3.456 memcached ████ 0.263 default █████████████████████████████████████████ 2.334 redis
The conclusion of that last benchmark is that Redis is still faster and it's roughly 1.8x faster to run these backends on the web head than to use ElastiCache. Perhaps that just goes to show how amazingly fast the AWS inter-datacenter fiber network is!
Comments
Post your own commentI've run redis with a million users without it crashing for years at a time. Not sure what's up with your install.
My problem was that the server ran out of memory. And instead of grinding to a halt, it crashes.
Great to see more benchmarks out there. In my local testing, Memcached often outperforms Redis by a fair margin. Seems like AWS ElastiCache is having a large effect on the results. You may be interested in my DiskCache project: http://www.grantjenks.com/docs/diskcache/ which also benchmarks Django cache backends.
How did you measure memcache outperformed Redis?
Also, disk cache seems dangerously inefficient when you have multiple web heads.
Mind you, this while blog is served from disk. Django renders HTML which is dumped on disk and Nginx renders that, and falls back to Django (via uWSGI).
Similar setup to yours. Initialize Django with multiple caches then test the performance of get/set/delete operations. Script in Github repo at tests/benchmark_djangocache.py Results are here: http://www.grantjenks.com/docs/diskcache/djangocache-benchmarks.html I'm measuring multiple percentiles: 50th (median), 90th, 99th, and max.
I'm not sure what "multiple web heads" refers to but I will guess you mean a load balancer with multiple servers behind it. That is a scenario where you've probably outgrown DiskCache. Although if you load balance consistently (say by ip-hash) then the issue can be mitigated.
good stuff, i will try out redis and switch out memcached... wanted to do this for a while
I was reading an article on why going over the network/socket is so much slower than doing it natively in Python (https://www.prowesscorp.com/computer-latency-at-a-human-scale/), and was wondering how locMem does it.
I am a newbie,and had just started searching, and almost immediately hit this article. This experiment confirms that the locMem cache is in fact in the process's own memory and not gotten through a socket. The socket, howsoever light it is, will presumably always be outperformed by local access. And then, one has to 'protect' socket 11211 from infiltrators (https://www.digitalocean.com/community/tutorials/how-to-secure-memcached-by-reducing-exposure) - not difficult, but still ...
The locmem cache's disadvantage is that its data is not shared across processes, and is not persistent (your process goes down, the cache gets busted). To me, given how complicated cache busting is, this looks like an advantage. Cache busting becomes as easy as shutting things down, and bringing them up again.
To 'fill' the cache after a shutdown automatically, I thought of two approaches -
1. Run "ab" with N parallel connections for a few seconds with all the target endpoints, where N is larger than the number of workers (in my case gunicorn sockets), forcing each to fill up their caches.
2. Have a special end-point, that just fills up the caches. I implemented the first, being lazy and not wanting to do the extra coding.
So each time I 'goDown' and 'bringUp' my code base, I just do a 'fillUp' using a script which calls ab. I do have dynamic pages, but large parts of each dynamic page is in fact static, so I cache fragments for repeat users, and full pages for first time visitors to each endpoint (those that come without a cookie).
I am worried I am doing it all wrong! Beacuse the other disadvantage of locMem is that each process will use up memory for its own cache. In my case, the caches are not big. If they do get bigger, I will try a shared memory approach, since that would still be much faster than sockets (http://pages.cs.wisc.edu/~adityav/Evaluation_of_Inter_Process_Communication_Mechanisms.pdf).
(This is my first project with Django, and I am using nginx, postgres, gunicorn, python, django, locMem, on a single CPU droplet on DOcean, with 3 gunicorn workers, static/media files being served from that droplet itself rather than AWS).
If you have a single CPU droplet, doesn't that mean you only have 1 CPU and thus there's no point using 3 gunicorn workers because two of them don't have distinct CPUs to use.
And it's true. If you do have, say, 4 CPUs and 4 gunicorn workers, suppose that you fill all your caches, you're going to need to use 4x as much RAM memory compared to letting all be handled by 1 memcache or redis.
the 3 gunicorn workers is as per the recommendation from gunicorn documentation (2*num_processors + 1).
http://docs.gunicorn.org/en/stable/design.html
My cache is small - max 500 items (static fragments of pages) X max 200kb = 100Mb. With 4Gb RAM, I thought I could afford the duplication.
Any thoughts on using `LocMemCache` with the Gunicorn preload (http://docs.gunicorn.org/en/stable/settings.html#preload-app) option to share a section of memory across workers? When would this be appropriate to use?
Or you could just use fewer workers and a bunch of threads. Then you could code with global mutables without worrying about thread-safety. All in all, sounds scary but there might be cases where there are performance benefits.
The obvious drawbacks are that you won't be able to reach that cache data from somewhere else. E.g. `./manage.py post-process-cached-things` and as soon as you destroy the gunicorn worker all that memory is lost and needs to be rebuilt. If you just stuff it in a db like Redis, the you can destroy and create new web workers without any risk of empty caches or stampeding herd problems.