Being able to do aggregate functions with Django's QuerySet API is really useful. Not because it's difficult to write your own loop but because the summation is then done inside the SQL database. I had this piece of code:


t = Decimal('0')
for each in some_queryset:
   t += each.cost

Which can be rewritten like this instead:


t = qs.aggregate(Sum('cost'))['cost__sum']

For my 6,000+ records in the database the first one takes about 0.7 seconds. The aggregate takes 0.02 seconds. Blimey! That's over 30 fold difference in speed for practically the same thing.

Granted, when doing the loop you can do some other stuff such as counting or additional function calls but that difference is quite significant. In my current application those 0.7 seconds isn't really a problem but it quickly becomes when it has to be done over and over for multiple sets.

Comments

Dave Arter

What database backend are you using? Have you tried it with a variety of different backends?

lqc

Isn't that mainly the cost of creating model instances ? Can you try:

s = Decimal('0')
for value in some_queryset.values_list('cost', flat=True):
s += value

or even:

s = sum(qs.values_list('cost', flat=True), Decimal('0'))

Should be much faster then the first solution.

PS. Is there a way to indent code in comments ?

Peter Bengtsson

Ok. Benchmarked that too quickly. Using `values_list` is about 10 times slower than doing the aggregate in the database.

Faster than doing the full ORM but still not as fast as letting the database do it.

slafs

I'm kind of disappointed with that post. I like your blog and your entries about various things referring Django, benchmarks etc. and I enjoy reading it. But this?
lqc wrote some nice examples of how it should be done to have more reliable results. Nevertheless, in my opinion it's a quite odd comparison in general.

Regards

Peter Bengtsson

I'm sorry about that. This was not a scientific blog. Anybody knows that pushing the work into the database is going to be quicker. I just wanted to share some rough numbers of how much quicker.

The conclusion to take away is that if you want some performance you need to be ready to step out of the abstraction that something like an ORM offers.

Your email will never ever be published.

Related posts