I've blogged before about how this site can easily push out over 2,000 requests/second using only 6 WSGI workers excluding latency. The reason that's possible is because the whole page(s) can be cached server-side. What actually happens is that the whole rendered HTML blob is stored in the cache server (Redis in my case) so that no database queries are needed at all.
I wanted my site to still "feel" dynamic in the sense that once you post a comment (and it's published), the page automatically invalidates the cache and thus, the user doesn't have to refresh his browser when he knows it should have changed. To accomplish this I used a hacked cache_page
decorator that makes the cache key depend on the content it depends on. Here's the code I actually use today for the home page:
def _home_key_prefixer(request):
if request.method != 'GET':
return None
prefix = urllib.urlencode(request.GET)
cache_key = 'latest_comment_add_date'
latest_date = cache.get(cache_key)
if latest_date is None:
# when a blog comment is posted, the blog modify_date is incremented
latest, = (BlogItem.objects
.order_by('-modify_date')
.values('modify_date')[:1])
latest_date = latest['modify_date'].strftime('%f')
cache.set(cache_key, latest_date, 60 * 60)
prefix += str(latest_date)
try:
redis_increment('homepage:hits', request)
except Exception:
logging.error('Unable to redis.zincrby', exc_info=True)
return prefix
@cache_page_with_prefix(60 * 60, _home_key_prefixer)
def home(request, oc=None):
...
try:
redis_increment('homepage:misses', request)
except Exception:
logging.error('Unable to redis.zincrby', exc_info=True)
...
And in the models I then have this:
@receiver(post_save, sender=BlogComment)
@receiver(post_save, sender=BlogItem)
def invalidate_latest_comment_add_dates(sender, instance, **kwargs):
cache_key = 'latest_comment_add_date'
cache.delete(cache_key)
So this means:
- whole pages are cached for long time for fast access
- updates immediately invalidates the cache for best user experience
- no need to mess with ANY SQL caching
So, the next question is, if posting a comment means that the cache is invalidated and needs to be populated, what's the ratio of hits versus hits where the cache is cleared? Glad you asked. That's why I made this page:
It allows me to monitor how often a new blog comment or general time-out means poor django needs to re-create the HTML using SQL.
At the time of writing, one in every 25 hits to the homepage requires the server to re-generate the page. And still the content is always fresh and relevant.
The next level of optimization would be to figure out whether a particular page update (e.g. a blog comment posting on a page that isn't featured on the home page) should or should not invalidate the home page. esp
Comments
I also found at that with fully cached pages, make sure the following is set as well:
CACHE_MIDDLEWARE_ANONYMOUS_ONLY = False
Otherwise, Django accesses request.session and the user table, resulting in a DB query for every request. With this setting to False, Django can run purely from cache,