About a month ago I add a new feature to django-static that makes it possible to define a function that all files of django-static
goes through.
First of all a quick recap. django-static
is a Django plugin that you use from your templates to reference static media. django-static
takes care of giving the file the optimum name for static serving and if applicable compresses the file by trimming all whitespace and what not. For more info, see The awesomest way possible to serve your static stuff in Django with Nginx
The new, popular, kid on the block for CDN (Content Delivery Network) is Amazon Cloudfront. It's a service sitting on top of the already proven Amazon S3 service which is a cloud file storage solution. What a CDN does is that it registers a domain for your resources such that with some DNS tricks, users of this resource URL download it from the geographically nearest server. So if you live in Sweden you might download myholiday.jpg
from a server in Frankfurk and if you live in North Carolina, USA you might download the very same picture from Virgina, USA. That assures the that the distance to the resource is minimized. If you're not convinced or sure about how CDNs work check out THE best practice guide for faster webpages by Steve Sounders (it's number two)
A disadvantage with Amazon Cloudfront is that it's unable to negotiate with the client to compress downlodable resources with GZIP. GZIPping a resource is considered a bigger optimization win than using CDN. So, I continue to serve my static CSS and Javascript files from my Nginx but put all the images on Amazon Cloudfront. How to do this with django-static
? Easy: add this to your settings:
DJANGO_STATIC = True
...other DJANGO_STATIC_... settings...
# equivalent of 'from cloudfront import file_proxy' in this PYTHONPATH
DJANGO_STATIC_FILE_PROXY = 'cloudfront.file_proxy'
Then you need to write that function that get's a chance to do something with every static resource that django-static
prepares. Here's a naive first version:
# in cloudfront.py
conversion_map = {} # global variable
def file_proxy(uri, new=False, filepath=None, changed=False, **kwargs):
if filepath and (new or changed):
if filepath.lower().split('.')[-1] in ('jpg','gif','png'):
conversion_map[uri] = _upload_to_cloudfront(filepath)
return conversion_map.get(uri, uri)
The files are only sent through the function _upload_to_cloudfront()
the first time they're "massaged" by django-static
. On consecutive calls nothing is done to the file since django-static
remembers, and sticks to, the way it dealt with it the first time if you see what I mean. Basically, when you have restarted your Django server the file is prepared and checked for a timestamp but the second time the template is rendered to save time it doesn't check the file again and just passes through the resulting file name. If this is all confusing you can start with a much simpler proxy function that looks like this:
def file_proxy(uri, new=False, filepath=None, changed=False, **kwargs):
print "Debugging and learning"
print uri
print "New", new,
print "Filepath", filepath,
print "Changed", changed,
print "Other arguments:", kwargs
return uri
The function to upload to Amazon Cloudfront is pretty straight forward thanks to the boto project. Here's my version:
import re
from django.conf import settings
import boto
_cf_connection = None
_cf_distribution = None
def _upload_to_cloudfront(filepath):
global _cf_connection
global _cf_distribution
if _cf_connection is None:
_cf_connection = boto.connect_cloudfront(settings.AWS_ACCESS_KEY,
settings.AWS_ACCESS_SECRET)
if _cf_distribution is None:
_cf_distribution = _cf_connection.create_distribution(
origin='%s.s3.amazonaws.com' % settings.AWS_STORAGE_BUCKET_NAME,
enabled=True,
comment=settings.AWS_CLOUDFRONT_DISTRIBUTION_COMMENT)
# now we can delete any old versions of the same file that have the
# same name but a different timestamp
basename = os.path.basename(filepath)
object_regex = re.compile('%s\.(\d+)\.%s' % \
(re.escape('.'.join(basename.split('.')[:-2])),
re.escape(basename.split('.')[-1])))
for obj in _cf_distribution.get_objects():
match = object_regex.findall(obj.name)
if match:
old_timestamp = int(match[0])
new_timestamp = int(object_regex.findall(basename)[0])
if new_timestamp == old_timestamp:
# an exact copy already exists
return obj.url()
elif new_timestamp > old_timestamp:
# we've come across the same file but with an older timestamp
#print "DELETE!", obj_.name
obj.delete()
break
# Still here? That means that the file wasn't already in the distribution
fp = open(filepath)
# Because the name will always contain a timestamp we set faaar future
# caching headers. Doesn't matter exactly as long as it's really far future.
headers = {'Cache-Control':'max-age=315360000, public',
'Expires': 'Thu, 31 Dec 2037 23:55:55 GMT',
}
#print "\t\t\tAWS upload(%s)" % basename
obj = _cf_distribution.add_object(basename, fp, headers=headers)
return obj.url()
Moving on, unfortunately this isn't good enough. You see, from the time you have issued an upload to Amazon Cloudfront you immediately get a full URL for the resource but if it's a new distribution it will take a little while until the DNS propagates and becomes globally available. Therefore, the URL that you get back will most likely yield you a 404 Page not found if you try it immediately.
So to solve this problem I wrote a simple alternative to the Python dict()
type that works roughly the same except that myinstance.get(key)
will depend on time. 1 hour in this case. So it works something like this:
>>> slow_map = SlowMap(10)
>>> slow_map['key'] = "Value"
>>> print slow_map['key']
None
>>> from time import sleep
>>> sleep(10)
>>> print slow_map['key']
"Value"
And here's the code for that:
from time import time
class SlowMap(object):
"""
>>> slow_map = SlowMap(60)
>>> slow_map[key] = value
>>> print slow_map.get(key)
None
Then 60 seconds goes past:
>>> slow_map.get(key)
value
"""
def __init__(self, timeout_seconds):
self.timeout = timeout_seconds
self.guard = dict()
self.data = dict()
def get(self, key, default=None):
value = self.data.get(key)
if value is not None:
return value
value, expires = self.guard.get(key)
if expires < time():
# good to release
self.data[key] = value
del self.guard[key]
return value
else:
# held back
return default
def __setitem__(self, key, value):
self.guard[key] = (value, time() + self.timeout)
With all of that ready willing and able you should now be able to serve your images from Amazon Cloudfront simply by doing this in your Django templates:
{% staticfile "/img/mysprite.gif" %}
To test this I've deployed this technique on my money making site code guinea pig Crosstips. Go ahead, visit that site and use Firebug or view the source and check out the URLs used for the images. They look something like this: http://dpv9al5z7o7rq.cloudfront.net/ctw-screenshot.1242930552.png
If you want to look at my code used for Crosstips download this file. It's pretty generic to anybody who wants to achieve the same thing.
Have fun and happy CDN'ing!
Comments
is there a way to upload all the FileFields and the ImageFields to the Amazon Cloudfront using this app? that's one thing I need to do in a near future
There's an app called django-storage which I've used in another project to upload FileFields to Amazon S3. If it doesn't have Cloudfront support yet, that package would be the best place to start.
thanks, I'll look up to it.
Nice post and explanation. I just came across django-queued-storage that has a slightly different approach to storing data on Cloudfront: http://github.com/seanbrant/django-queued-storage
Not sure if the celery scheduled task responsible for pushing to Cloudfront provides all the functionality of your SlowMap though.