I'm currently working on a Django library that uses mozjpeg to optimize thumbnails that are generated from stored images. I first wanted to get a feel for how good mozjpeg
really is.
In my ~/Downloads
directory I have all sorts of "junk" from all sorts of saves and experiments. It'll work as a good testbed of relatively random JPEG images of all sorts of sizes and qualities. Without further ado, here's the results:
FILENAME OPTIMIZE ORIGINAL SAVING PERCENT ----------------------------------------------------------------------------------------- 180697_1836563311933_3364808_n.jpg 45.2Kb 50.4Kb 5.1Kb 10.2% 2014-03-20 17.35.39.jpg 2040.1Kb 2207.8Kb 167.7Kb 7.6% 2015-03-04 21.18.16.jpg 1521.5Kb 1629.2Kb 107.7Kb 6.6% 2015-03-04 21.19.16.jpg 1602.4Kb 1720.0Kb 117.6Kb 6.8% 2015-03-04 21.23.16.jpg 1181.7Kb 1272.1Kb 90.4Kb 7.1% 2015-03-05 06.03.00.jpg 1426.7Kb 1557.7Kb 131.0Kb 8.4% 20150626_200629_001.jpg 1566.4Kb 1717.3Kb 151.0Kb 8.8% 20150626_200631.jpg 2157.6Kb 2319.6Kb 162.0Kb 7.0% Boba_Fett_by_RobD4E.jpg 96.2Kb 104.3Kb 8.1Kb 7.8% Horse_Play.jpg 170.4Kb 185.2Kb 14.9Kb 8.0% Image (107).jpg 344.9Kb 390.6Kb 45.7Kb 11.7% Misc Candle Holder NECA FOTR Balrog Dec2002.jpg 37.1Kb 37.7Kb 0.6Kb 1.5% Mozilla_Lightbeam.jpg 55.1Kb 79.7Kb 24.6Kb 30.8% Photo on 12-17-14 at 5.55 PM.jpg 168.5Kb 187.7Kb 19.2Kb 10.2% dev.jpg 17.5Kb 30.8Kb 13.3Kb 43.2% dev2.jpg 41.1Kb 54.3Kb 13.3Kb 24.4% dev3.jpg 35.3Kb 49.0Kb 13.7Kb 28.0% dev4.jpg 42.0Kb 56.0Kb 14.0Kb 25.0% dev5.jpg 24.6Kb 37.9Kb 13.2Kb 35.0% dev6.jpg 28.9Kb 42.8Kb 13.9Kb 32.4% hr_0570_220_135__0570220135006.jpg 3124.3Kb 3467.8Kb 343.5Kb 9.9% hr_0570_220_158__0570220158006.jpg 3010.0Kb 3319.1Kb 309.1Kb 9.3% hr_0570_220_175__0570220175006.jpg 2245.5Kb 2442.6Kb 197.0Kb 8.1% hr_0570_227_599__0570227599006.jpg 2561.7Kb 2809.8Kb 248.1Kb 8.8% hr_0596_622_701__0596622701006.jpg 3238.8Kb 3453.6Kb 214.7Kb 6.2% hr_0596_623_849__0596623849006.jpg 2902.9Kb 3102.1Kb 199.3Kb 6.4% hr_0622_219_873__0622219873006.jpg 985.3Kb 1066.9Kb 81.7Kb 7.7% logo.jpg 43.5Kb 51.2Kb 7.7Kb 15.1% mvm-header.jpg 8.5Kb 12.4Kb 3.9Kb 31.6% mvm-postcard-picture.jpg 72.2Kb 73.4Kb 1.3Kb 1.7% overhang_pixels.jpg 3014.3Kb 3370.8Kb 356.4Kb 10.6% peterbe copy.jpg 4.2Kb 10.4Kb 6.2Kb 59.7% peterbe.jpg 36.7Kb 44.3Kb 7.5Kb 17.0% pjt-mcguinty-2.jpg 96.8Kb 101.6Kb 4.8Kb 4.8% sl1.jpg 28.7Kb 35.4Kb 6.7Kb 18.9%
That's an median of 9.3% (average of 15.3%) savings.
It's not very fast though. Some of the large files take more than a second. In total it took 23.7 seconds to create all of those optimized files. Do what you want with that fact, bear in mind that these are hopefully "once in a lifetime" operations (depending on the ephemerality of your thumbnail storage). Mind you, the really large JPEGs skew that since the median is 72.1 milliseconds and average is 527.0 milliseconds. Also, when I look through the numbers I find that the large JPGs take the longest but had the least benefit in terms of byte savings.
UPDATE
Chris Adams, in the comment below, inspired me to compare my trials with jpegoptim and jpegrescan. So, I took my script that generated a directory of 45 JPEGs and changed it to use jpegoptim
and jpegrescan
.
The mozjpeg
total size of that output directory is 34.1Mb and it took a total of 23.3 seconds (median 76.4 milliseconds).
The jpegoptim & jpegrescan
total size of that output directory is 35.6Mb and it took a total of 4.6 seconds (median 32.1 milliseconds).
In other words, roughly speaking mozjpeg
is 4.2% more space effective and 58% slower than jpegoptim & jpegrescan
.
Comments
Have you compared mozjpeg's lossless optimization to other tools like jpegoptim / jpegrescan? I was recently testing that (https://gist.github.com/acdha/d85c927d35ee6df2c57d#file-optimize-images-sh) and found similar results (big saves for smaller files, uncompeling time/benefit trade-off for larger ones).
One optimization which I've considered but have not yet implemented was basically a hybrid approach: generate a thumbnail quickly with limited optimization, queue a task to run the optimizer, and serve it with a lower TTL until the optimized version is available.
See update above. Thanks for the comment.
I would advice against doing the optimization in the background thread because the complexity of that is enormous. Also, you probably have a CDN that works like CloudFront in that it picks up the thumbnails once and caches it under that filename for a long time. So unless you change the thumbnail file name after the background task is done, the CDN will have a stale copy.
If speed is a concern, and you can't wait 76 milliseconds (or less!) for each thumbnail, you could perhaps create the thumbnails as part of the CMS or some other script that doesn't mind waiting.
Yeah, I wouldn't recommend that as a general strategy but my rationale was roughly that while I have a CDN it has a hit rate somewhere around 60% and so all but the consistently most popular content refreshes more regularly than the TTLs. Having a queued worker looking at the more popular images and optimizing them really aggressively would be nice both for gradually improving things over time.
I agree, though, that this is an edge case. The first thing I was looking into doing was much easier, optimizing the DZI files we use with OpenSeadragon (http://openseadragon.github.io) since those are already generated off-line and stored durably.
The other thing I wanted to look into was the latency of calling a binary vs. doing this in process to see if it'd be worth using a cffi binding to optimize the images in memory (Pillow -> mozjpeg -> disk).
One thing to consider too, and maybe it's off-topic, but if you hand over the optimization work to someone like Kraken.io, then you have all the network overhead to worry about and that might make ~70ms into several seconds.
With the UPDATE numbers above in mind, I'm inclined to conclude that whichever is easiest to install on your server, use that.