Quick dog-piling (aka stampeding herd) URL stresstest

August 10, 2018
0 comments Python

Whenever you want to quickly bombard a URL with some concurrent traffic, you can use this:


import random
import time
import requests
import concurrent.futures


def _get_size(url):
    sleep = random.random() / 10
    # print("sleep", sleep)
    time.sleep(sleep)
    r = requests.get(url)
    # print(r.status_code)
    assert len(r.text)
    return len(r.text)


def run(url, times=10):
    sizes = []
    futures = []
    with concurrent.futures.ThreadPoolExecutor() as executor:
        for _ in range(times):
            futures.append(executor.submit(_get_size, url))
        for future in concurrent.futures.as_completed(futures):
            sizes.append(future.result())
    return sizes


if __name__ == "__main__":
    import sys

    print(run(sys.argv[1]))

It's really basic but it works wonderfully. It starts 10 concurrent threads that all hit the same URL at almost the same time.
I've been using this stress test a local Django server to test some atomicity writes with the file system.

HTMLMinifier in use on this blog now

August 7, 2018
3 comments Web development, JavaScript, Web Performance

Last week I enabled HTMLMinifier as a post-build step for server-rendered content here on this blog. Basically, after a page is rendered in Django, it's sent to a Celery queue that does things to the index.html file. The first thing it does its that it extracts the stylesheets and replaces them with a block of inline CSS. More details in this blog post. Secondly, what the background job does it that it sends the index.html file to node_modules/.bin/html-minifier. See the code here.

What that does is that it removes quotation marks where not needed (e.g. <div id=foo> instead of <div id="foo">), removes HTML comments, and lastly removes whitespace that is not needed. The result is that the HTML now looks like this:

View source

I also added a line of logging that spits out a measurement of the size of the HTML size before, before with gzip, after, and after with gzip. Why? Because the optimization of HTML minification is usually insignificant after you gzip. See this blog post about how insignificant space optimization is in comparison to gzip. Look at the sample log lines:

...
Minified before: 38,249 bytes (11,150 gzipped), After: 36,098 bytes (10,875 gzipped), Shaving 2,151 bytes (275 gzipped)
Minified before: 37,698 bytes (10,534 gzipped), After: 35,622 bytes (10,243 gzipped), Shaving 2,076 bytes (291 gzipped)
Minified before: 58,846 bytes (14,623 gzipped), After: 55,540 bytes (14,313 gzipped), Shaving 3,306 bytes (310 gzipped)
...

So this last one saved 3.2KB of HTML document which isn't a sneeze, but since 99% of clients support gzip, it actually only saved 310 bytes. As a matter of fact, I parsed the log lines and calculated the average and it was saving 338 bytes per page.

Worth it? I doubt it. It's not without risks and now it's slightly harder and weirder to view the source. However 338 bytes multiplied by the total number of visitors per month, I estimate to save a total of 161 MB of data less to be sent.

To defer or to async JavaScript tags. That's the question.

June 29, 2018
0 comments Web development, JavaScript, Web Performance

tl;dr; async scores slightly better that defer (on script tags) in this experiment using Webpagetest.

Much has been written about the difference between <script defer src="..."> and <script async src="..."> but nothing beats seeing it visually in Webpagetest.

Here are some good articles/resources:

So I took a page off my own blog. Butchered it and cleaned up the 6 <script> tags. It uses HTTP/2 and some jQuery and some other vanilla JavaScript stuff. See the page here: neither.html
Then I copied that HTML file and replaced all <script src="..."> with <script defer src="...">: defer.html. And lastly, the same with: async.html.

First let's compare all three against each other:

Neither vs defer vs async
Neither vs defer vs async on Webpagetest.

Clearly, making the JavaScript non-blocking is critical for web performance. That's 1.7 seconds instead of 2.8 seconds.

Second, let's compare just defer vs. async on a 4G connection:

defer vs. async on 4G
defer vs. async on 4G Also, if you like here's defer vs. async on a desktop browser instead.

Conclusions

  1. Don't allow your JavaScript to block rendering unless it's OK to have your users staring at a white screen till everything has landed.

  2. There's not much difference between defer and async. async has a slight advantage as per these experiments. I'm only capable of guessing, but I suspect it's because it can "spread out" the work better and get some work done in parallel whilst defer has things that tell it to wait. In particular, since with defer the order of the <script> tags is respected. Suppose that the file some.jquery.plugin.js downloads before jquery.min.js, then that file has to be blocked and execution delayed whilst waiting for jquery.min.js to download, parse and execute. With async it's more of a wild west of executing whenever you can.

  3. The async.html is busted because of the unpredictable order of execution and these .js files depend on the order. Another reason to use defer if your scripts have that order-dependency problem.

  4. Consider using a mix of async and defer. async has the advantage that some parsing/execution can be done by the main thread whilst waiting for other blocking resources like images.

A good Django view function cache decorator for http.JsonResponse

June 20, 2018
0 comments Python, Web development, Django

I use this a lot. It has served me very well. The code:


import hashlib
import functools

import markus  # optional
from django.core.cache import cache
from django import http
from django.utils.encoding import force_bytes, iri_to_uri

metrics = markus.get_metrics(__name__)  # optional


def json_response_cache_page_decorator(seconds):
    """Cache only when there's a healthy http.JsonResponse response."""

    def decorator(func):

        @functools.wraps(func)
        def inner(request, *args, **kwargs):
            cache_key = 'json_response_cache:{}:{}'.format(
                func.__name__,
                hashlib.md5(force_bytes(iri_to_uri(
                    request.build_absolute_uri()
                ))).hexdigest()
            )
            content = cache.get(cache_key)
            if content is not None:

                # metrics is optional
                metrics.incr(
                    'json_response_cache_hit',
                    tags=['view:{}'.format(func.__name__)]
                )

                return http.HttpResponse(
                    content,
                    content_type='application/json'
                )
            response = func(request, *args, **kwargs)
            if (
                isinstance(response, http.JsonResponse) and
                response.status_code in (200, 304)
            ):
                cache.set(cache_key, response.content, seconds)
            return response

        return inner

    return decorator

To use it simply add to Django view functions that might return a http.JsonResponse. For example, something like this:


@json_response_cache_page_decorator(60)
def search(request):
    q = request.GET.get('q')
    if not q:
        return http.HttpResponseBadRequest('no q')
    results = search_database(q)
    return http.JsonResponse({
        'results': results,
    })

The reasons I use this instead of django.views.decorators.cache.cache_page() is because of a couple of reasons.

  • cache_page generates cache keys that don't contain the view function name.
  • cache_page tries to cache the whole http.HttpResponse instance which can't be serialized if you use the msgpack serializer.
  • cache_page also sends Cache-Control headers which is not always what you want.
  • Not possible to inject your own custom code such as my usage of metrics.

Disclaimer: This snippet of code comes from a side-project that has a very specific set of requirements. They're rather unique to that project and I have a full picture of the needs. E.g. I know what specific headers matter and don't matter. Your project might be different. For example, perhaps you don't have markus to handle your metrics. Or perhaps you need to re-write the query string for something to normalize the cache key differently. Point being, take the snippet of code as inspiration when you too find that django.views.decorators.cache.cache_page() isn't good enough for your Django view functions.

The best grep tool in the world; ripgrep

June 19, 2018
3 comments Linux, Web development, macOS

tl;dr; ripgrep (aka. rg) is the best tool to grep today.

ripgrep is a tool for searching files. Its killer feature is that it's fast. Like, really really fast. Faster than sift, git grep, ack, regular grep etc.

If you don't believe me, either read this detailed blog post from its author or just jump straight to the conclusion:

  • For both searching single files and huge directories of files, no other tool obviously stands above ripgrep in either performance or correctness.

  • ripgrep is the only tool with proper Unicode support that doesn’t make you pay dearly for it.

  • Tools that search many files at once are generally slower if they use memory maps, not faster.

Benchmark
Benchmark

I used to use git grep whenever I was inside a git repo and sift for everything else. That alone, was a huge step up from regular grep. Granted, almost all my git repos are small enough that regular git grep is faster than I can perceive many times. But with ripgrep I can just add --no-ignore-vcs and it searches in all the files mentioned in .gitignore too. That's useful when you want to search in your own source as well as the files in node_modules.

The installation instructions are easy. I installed it with brew install ripgrep and the best way to learn how to use it is rg --help. Remember that it has a lot of cool features that are well worth learning. It's written in Rust and so far I haven't had a single crash, ever. The ability to search by file type gets some getting used to (tip! use: rg --type-list) and remember that you can pipe rg output to another rg. For example, to search for all lines that contain query and string you can use rg query | rg string.

How to unset aliases set by Oh My Zsh

June 14, 2018
5 comments Linux, macOS

I use Oh My Zsh and I highly recommend it. However, it sets some aliases that I don't want. In particular, there's a plugin called git.plugin.zsh (located in ~/.oh-my-zsh/plugins/git/git.plugin.zsh) that interfers with a global binary I have in $PATH. So when I start a shell the executable gg becomes...:

which gg
gg: aliased to git gui citool

That overrides /usr/local/bin/gg which is the one I want to execute when I type gg. To unset that I can run...:

unset gg

▶ which gg
/usr/local/bin/gg

To override it "permanently", I added, to the end of ~/.zshrc:


# This unsets ~/.oh-my-zsh/plugins/git/git.plugin.zsh
# So my /usr/local/bin/gg works instead
unalias gg

Now whenever I start a new terminal, it defaults to the gg in /usr/local/bin/gg instead.

How to NOT start two servers on the same port

June 11, 2018
2 comments Linux, Web development

First of all, you can't start two servers on the same port. Ultimately it will fail. However, you might not want a late notice of this. For example, if you do this:


# In one terminal
$ cd elasticsearch-6.1.0
$ ./bin/elasticsearch
...
$ curl localhost:9200
...
"version" : {
    "number" : "6.1.0",
...

# In *another* terminal
$ cd elasticsearch-6.2.4
$ ./bin/elasticsearch
...
$ curl localhost:9200
...
"version" : {
    "number" : "6.1.0",
...

In other words, what happened to the elasticsearch-6.2.4/bin/elasticsearch?? It actually started on port :9201. But that's a rather scary thing because as you jump between project in different tabs or you might not notice that you have Elasticsearch running with docker-compose somewhere.

To remedy this I use this curl one-liner:


$ curl -s localhost:9200 > /dev/null && echo "Already running!" && exit || ./bin/elasticsearch

Now if you try to start a server on a used port it will exit early.

To wrap this up in a script, take this:


#!/bin/bash

set -eo pipefail

hostandport=$1
shift
curl -s "$hostandport" >/dev/null && \
  echo "Already running on $hostandport" && \
  exit 1 || exec "$@"

...and make it an executable called unlessalready.sh and now you can do this:


$ unlessalready.sh localhost:9200 ./bin/elasticsearch

GeneratorExit - How to clean up after the last yield in Python

June 7, 2018
7 comments Python

tl;dr; Use except GeneratorExit if your Python generator needs to know the consumer broke out.

Suppose you have a generator that yields things out. After each yield you want to execute some code that does something like logging or cleaning up. Here one such trivialized example:

The Problem


def pump():
    numbers = [1, 2, 3, 4]
    for number in numbers:
        yield number
        print("Have sent", number)
    print("Last number was sent")


for number in pump():
    print("Got", number)

print("All done")

The output is, as expected:

Got 1
Have sent 1
Got 2
Have sent 2
Got 3
Have sent 3
Got 4
Have sent 4
Last number was sent
All done

In this scenario, the consumer of the generator (the for number in pump() loop in this example) gets every thing the generator generates so after the last yield the generator is free to do any last minute activities which might be important such as closing a socket or updating a database.

Suppose the consumer is getting a bit "impatient" and breaks out as soon as it has what it needed.


def pump():
    numbers = [1, 2, 3, 4]
    for number in numbers:
        yield number
        print("Have sent", number)
    print("Last number was sent")


for number in pump():
    print("Got", number)
    # THESE TWO NEW LINES
    if number >= 2:
        break

print("All done")

What do you think the out is now? I'll tell you:

Got 1
Have sent 1
Got 2
All done

In other words, the potentially important lines print("Have sent", number) and print("Last number was sent") never gets executed! The generator could tell the consumer (through documentation) of the generator "Don't break! If you don't want me any more raise a StopIteration". But that's not a feasible requirement.

The Solution

But! There is a better solution and that's to catch GeneratorExit exceptions.


def pump():
    numbers = [1, 2, 3, 4]
    try:
        for number in numbers:
            yield number
            print("Have sent", number)
    except GeneratorExit:
        print("Exception!")
    print("Last number was sent")


for number in pump():
    print("Got", number)
    if number == 2:
        break
print("All done")

Now you get what you might want:

Got 1
Have sent 1
Got 2
Exception!
Last number was sent
All done

Next Level Stuff

Note in the last example's output, it never prints Have sent 2 even though the generator really did send that number. Suppose that's an important piece of information, then you can reach that inside the except GeneratorExit block. Like this for example:


def pump():
    numbers = [1, 2, 3, 4]
    try:
        for number in numbers:
            yield number
            print("Have sent", number)
    except GeneratorExit:
        print("Have sent*", number)
    print("Last number was sent")


for number in pump():
    print("Got", number)
    if number == 2:
        break
print("All done")

And the output is:

Got 1
Have sent 1
Got 2
Have sent* 2
Last number was sent
All done

The * is just in case we wanted to distinguish between a break happening or not. Depends on your application.

Writing a custom Datadog reporter using the Python API

May 21, 2018
2 comments Python

Datadog is an awesome sofware-as-a-service where you can aggregate and visualize statsd metrics sent from an application. For visualizing timings you create a time series graph. It can look something like this:

Time series

This time series looks sane because because it's timings made very very frequently. But what if it happens very rarely. Like once a day. Then, the graph doesn't look very useful. See this example:

"Rare time" series

Not only is it happening rarely, the amount of seconds is really quite hard to parse. I.e. what's 2.6 million milliseconds (answer is approximately 45 minutes). So to solve that I used the Datadog API. Now I can get a metric of every single point in milliseconds and I can make a little data table with human-readable dates and times.

The end result looks something like this:

SCOPE: ENV:PROD
+-------------------------+------------------------+-----------------------+
|          WHEN           |        TIME AGO        |       TIME TOOK       |
+=========================+========================+=======================+
| Mon 2018-05-21T17:00:00 | 2 hours 43 minutes ago | 23 minutes 32 seconds |
+-------------------------+------------------------+-----------------------+
| Sun 2018-05-20T17:00:00 | 1 day 2 hours ago      | 20 seconds            |
+-------------------------+------------------------+-----------------------+
| Sat 2018-05-19T17:00:00 | 2 days 2 hours ago     | 20 seconds            |
+-------------------------+------------------------+-----------------------+
| Fri 2018-05-18T17:00:00 | 3 days 2 hours ago     | 2 minutes 24 seconds  |
+-------------------------+------------------------+-----------------------+
| Wed 2018-05-16T20:00:00 | 4 days 23 hours ago    | 38 minutes 38 seconds |
+-------------------------+------------------------+-----------------------+

It's not gorgeous and there are a lot of caveats but it's at least really easy to read. See the metrics.py code here.

I don't think you can run this code since you don't have the same (hardcoded) metrics but hopefully it can serve as an example to whet your appetite.

What I'm going to do next, if I have time, is to run this as a Flask app instead that outputs a HTML table on a Herokup app or something.

To CDN assets or just HTTP/2

May 17, 2018
3 comments Web development, JavaScript, Web Performance

tl;dr; I see little benefit in using a CDN at this point.

I took two random pages here on my blog. One and Another. Doesn't matter what they say but it's important to notice that they're extremely similar. No big pictures. Both have 1 banner ad each. Both served with HTTP/2. Neither have any blocking linked assets. I.e. there is no blocking <link ref="stylesheet" href="styles.css"> and the script tags are are either async or defer. Both pages reference one little .png that is not deliberately lazy loaded. That's the baseline.

The HTML document, in both URLs, is served with HTTP/2 but it references a the lazy loaded .css and (a bunch of) .js files, via a CDN. In other words, it looks like this:


▶ curl -v https://www.peterbe.com/plog/hashin-0.7.0
...
> GET /plog/hashin-0.7.0 HTTP/2
...
< HTTP/2 200
...
<
...
<link rel="preload" href="/static/css/base.min.e8df96d84663.css" 
 as="style" onload="this.onload=null;this.rel='stylesheet'">
...
<script defer src="/static/js/blogitem-post.min.f6c0be691e73.js"></script>
...

So, cdn-2916.kxcdn.com is a an awesome CDN, but to a first-time visitor, that is going to require a DNS lookup and the creation of a new TCP connection that can be kept alive. The alternative to this is to not put any of the of the .png, .css or .js assets on a CDN. Basically, instead of <script src="https://mycdn.example.com/foo.js">, just do <script src="/foo.js">.

CDNs are really important since latency is a killer to web performance and remember that "Use a CDN" is rule number 2 in the, now dated, YSlow ruleset. However, we're entering an era where HTTP/2 is becoming more and more available in mainstream browsers (hint: nearly 100% of visitors to my site are HTTP/2 support). Buuuuuut, the latency (DNS, connection and SSL negotiation) doesn't matter that much if you have already paid those costs to get to the origin web server (https://www.peterbe.com in this example).

The Experiment

What I'm interested in seeing if there is a way to gauge/measure when it's best to use a CDN and when it's best to use the origin web server to serve all assets. My friend @stereobooster suggested: "Webpagetest.org is all you need"

Ok. Let's measure that then with Webpagetest.org and see what we can learn.

Here's a visual comparison of the two URLs when they both use CDN for the static assets.

  • They load pretty equally.
  • The Waterfall View looks almost identical.
  • Confirmed, there are no render blocking resources as it starts to paint already at about 1.5s.

Here's a visual comparison of one using a CDN for static assets and one does not.

  • They load pretty equally (diff by 0.1s).
  • The Waterfall View looks very different.
  • The second one does not have a second "dns - connection - ssl - download" bar.
  • Almost all the .js are downloaded at about 1.8s when there's no CDN.
  • Almost all the .js are downloaded at about 3.0s when using a CDN.
  • Use the little "Waterfall opacity" widget to slide left and right to see the difference.

You can see their webpagetests individually here and here.

Assets over CDN
Two connection prices paid. Downloads individual assets faster but ultimately takes a longer time.

One HTTP/2 connection only
Only 1 connection price paid. ALL assets downloaded sooner, albeit individually slower.

Analysis

My web server is served from a highly optimized Nginx server in New York, USA. The two Webpagetest visual comparisons above are both done from Virgina, USA. But the killer feature of a CDN is that latency can be so much better thanks to edge locations of the CDN. In particular, KeyCDN have an edge location in Stockholm, Sweden. So what happens when you run the URLs from a Webpagetest machine in Stockholm, Sweden?

The both start to render at the same time (expected since the HTML document is still in New York, USA) but the (rougly) total time to download all the .css and .js is (about) 2.6 seconds when a CDN and 1.9 seconds without a CDN. In other words, despite the CDN geographically so much closer, the static assets are still available sooner without a CDN.

It's pretty clear at this point that it's not a good idea to use a CDN for static assets. Even if they're not critical. The "First Meaningful Paint" and "Time To Interactive" are about the same but when HTTP/2 can download all the .js files faster, their useful JavaScript can start being useful sooner with HTTP/2.

What Else

So in my site, it's easiest to host the whole site on an Nginx server in a Digital Ocean server. It's easy to invalidate its cache (just delete the file from disk and wait for Django to regenerate it). Another advantage with using plain Nginx is that I serve the HTML with Cache-Control headers and then do some post-processing of the .html file and since Nginx is disk-based, I don't have to update a CDN.

An alternative would be to put the whole site behind a CDN. That way, the initial HTML document can be served from a CDN edge location, using HTTP/2 and send the rest of the static assets on the same HTTP/2 connection. But this means that every single dynamic URL (e.g. HTTP POSTs or some per-user XHR requests) has to go via a CDN rather than going straight to the Nginx that is connected to the Django web server.

Last but not least, even though my Nginx server is on a decent machine and pretty well tuned, I very much doubt it's as fast and powerful as a KeyCDN or CloudFront or Akamai or Google Cloud CDN. Those servers are beasts! Mind you, the DNS + connection + SSL negotiation, when requesting from Stockholm, Sweden was about 0.75s to my Nginx in New York, USA. For the KeyCDN edge location the DNS + connection + SSL negotiation was about 0.52s. So not a huge difference actually.

Another important aspect is Service Workers. Perhaps I don't know how to hack it, but it doesn't work when you use differnet domains for the service worker .js file and the URIs it references.

In conclusion; I see little benefit in using a CDN at this point. Perhaps for larger assets like videos, GIFs or high-res images. HTTP/2 changes one of the major web performance rules. End of an era(?)