Django test optimization with no-op PIL engine

Thursday, Oct 27, 2016
6 comments Python, Django

The Air Mozilla project is a regular Django webapp. It's reasonably big for a more or less one man project. It's ~200K lines of Python and ~100K lines of JavaScript. There are 816 "unit tests" at the time of writing. Most of them are kinda typical Django tests. Like:


def test_some_feature(self):
    thing = MyModel.objects.create(key='value')
    url = reverse('namespace:name', args=(thing.id,))
    response = self.client.get(url)
    ....

Also, the site uses sorl.thumbnail to automatically generate thumbnails from uploaded images. It's a great library.

However, when running tests, you almost never actually care about the image itself. Your eyes will never feast on them. All you care about is that there is an image, that it was resized and that nothing broke. You don't write tests that checks the new image dimensions of a generated thumbnail. If you need tests that go into that kind of detail, it best belongs somewhere else.

So, I thought, why not fake ALL operations that are happening inside sorl.thumbnail to do with resizing and cropping images.

Here's the changeset that does it. Note, that the trick is to override the default THUMBNAIL_ENGINE that sorl.thumbnail loads. It usually defaults to sorl.thumbnail.engines.pil_engine.Engine and I just wrote my own that does no-ops in almost every instance.

I admittedly threw it together quite quickly just to see if it was possible. Turns out, it was.


# Depends on setting something like:
#    THUMBNAIL_ENGINE = 'airmozilla.base.tests.testbase.FastSorlEngine'
# in your settings specifically for running tests.


from sorl.thumbnail.engines.base import EngineBase


class _Image(object):
    def __init__(self):
        self.size = (1000, 1000)
        self.mode = 'RGBA'
        self.data = '\xa0'


class FastSorlEngine(EngineBase):

    def get_image(self, source):
        return _Image()

    def get_image_size(self, image):
        return image.size

    def _colorspace(self, image, colorspace):
        return image

    def _scale(self, image, width, height):
        image.size = (width, height)
        return image

    def _crop(self, image, width, height, x_offset, y_offset):
        image.size = (width, height)
        return image

    def _get_raw_data(self, image, *args, **kwargs):
        return image.data

    def is_valid_image(self, raw_data):
        return bool(raw_data)

So, was it much faster?

It's hard to measure because the time it takes to run the whole test suite depends on other stuff going on on my laptop during the long time it takes to run the tests. So I ran them 8 times with the old code and 8 times with this new hack.

Iteration	Before	After
1	82.789s	73.519s
2	82.869s	67.009s
3	77.100s	60.008s
4	74.642s	58.995s
5	109.063s	80.333s
6	100.452s	81.736s
7	85.992s	61.119s
8	82.014s	73.557s
Average	86.865s	69.535s
Median	82.869s	73.519s
Std Dev	11.826s	9.0757s

So rougly 11% faster. Not a lot but it adds up when you're doing test-driven development or debugging where you run a suite or a test over and over as you're saving the files/tests you're working on.

Room for improvement

In my case, it just worked with this simple solution. Your site might do fancier things with the thumbnails. Perhaps we can combine forces on this and finalize a working solution into a standalone package.

Comments

Post your own comment

Dane Hillard October 28, 2016

Could you speak to the benefits of using this approach over something like unittest.mock.Mock?

Peter Bengtsson October 28, 2016

First of all, I didn't even know that mock was part of unittest now. I thought you still had to install it separately.

Generally, I suspect both will work. Maybe more a matter of taste. I'm generally pessimistic towards mocking unless it's the only way possible. Mocking is a clever but equally nasty hack and the code often becomes hard to read (once it's escaped your short-term memory) and it's so easy to "overmock" and accidentally make everything a mock object that doesn't help you check your sanity.

Dane Hillard October 28, 2016

Done well, I believe mocking can be incredibly insightful and readable. I'll admit that doing it well is often less trivial than it sounds! I also see the value in creating objects that are essentially test harnesses, so I'm not necessarily saying I'd never follow your approach. Just wanted to get your thoughts. Thanks for the input!

Israel Fruchter October 31, 2016

Yeah you could over mock thing, but will always prefer using the same approaches in all of my tests, writing a specific mock for each thing seems a weird, and not all code lend itself to the pattern you demonstrated (I.e. having a pluggable engines)

Mocking is a very valid approach, as you demonstrated.
I think that unit tests should be very specific, and anything beyond the limits of your process, should be avoided (mocked).

There are other types of tests, like component/integration tests, where the opposite is advised (but still for a lot of reason it perfectlly valid to use pretenders/simulators, for some parts of your system)

For example I'm started recently testing any component I'm writing in a docker compose setup which give me access to controlling the connections to DB, or other services, I.e. you can stop the database container and test reconnectivity.

Israel Fruchter October 28, 2016

Why not use the unittest.mock, and the you can also check if it was. called ?

Peter Bengtsson October 28, 2016

See response above to Dane.

Previous:: hashin 0.7.0 and multiple packages August 30, 2016 Python
Next:: Optimization of QuerySet.get() with or without select_related November 3, 2016 Python, Django, PostgreSQL

Related by category:: How I run standalone Python in 2025 January 14, 2025 Python; get in JavaScript is the same as property in Python February 13, 2025 Python; How to resolve a git conflict in poetry.lock February 7, 2020 Python; Best practice with retries with requests April 19, 2017 Python

Related by keyword:: sharp vs. jimp - Node libraries to make thumbnail images December 15, 2020 Node, Firebase, JavaScript; downloadAndResize - Firebase Cloud Function to serve thumbnails December 8, 2020 Web development, Node, That's Groce!, JavaScript; Time to do concurrent CPU bound work May 13, 2016 Python, Linux, macOS; Introducing optisorl August 18, 2015 Python

Django test optimization with no-op PIL engine

So, was it much faster?

Room for improvement

Comments

Related posts