"machma - Easy parallel execution of commands with live feedback"

This is so cool! https://github.com/fd0/machma

It's a command line program that makes it really easy to run any command line program in parallel. I.e. in separate processes with separate CPUs.

Something network bound

Suppose I have a file like this:

▶ wc -l urls.txt
      30 urls.txt

▶ cat urls.txt | head -n 3
https://s3-us-west-2.amazonaws.com/org.mozilla.crash-stats.symbols-public/v1/wntdll.pdb/D74F79EB1F8D4A45ABCD2F476CCABACC2/wntdll.sym
https://s3-us-west-2.amazonaws.com/org.mozilla.crash-stats.symbols-public/v1/firefox.pdb/448794C699914DB8A8F9B9F88B98D7412/firefox.sym
https://s3-us-west-2.amazonaws.com/org.mozilla.crash-stats.symbols-public/v1/d2d1.pdb/CB8FADE9C48E44DA9A10B438A33114781/d2d1.sym

If I wanted to download all of these files with wget the traditional way would be:

▶ time cat urls.txt | xargs wget -q -P ./downloaded/
cat urls.txt  0.00s user 0.00s system 53% cpu 0.005 total
xargs wget -q -P ./downloaded/  0.07s user 0.24s system 2% cpu 14.913 total

▶ ls downloaded | wc -l
      30

▶ du -sh downloaded
 21M    downloaded

So it took 15 seconds to download 30 files that totals 21MB.

Now, let's do it with machama instead:

▶ time cat urls.txt | machma -- wget -q -P ./downloaded/ {}
cat urls.txt  0.00s user 0.00s system 55% cpu 0.004 total
machma -- wget -q -P ./downloaded/ {}  0.53s user 0.45s system 12% cpu 7.955 total

That uses 8 separate processors (because my laptop has 8 CPUs).
Because 30 / 8 ~= 4, it roughly does 4 iterations.

But note, it took 15 seconds to download 30 files synchronously. That's an average of 0.5s per file. The reason it doesn't take 4x0.5 seconds (instead of 8 seconds) is because it's at the mercy of bad luck and some of those 30 spiking a bit.

Something CPU bound

Now let's do something really CPU intensive; Guetzli compression.

▶ ls images | wc -l
  7

▶ time find images -iname '*.jpg' | xargs -I {} guetzli --quality 85 {} compressed/{}
find images -iname '*.jpg'  0.00s user 0.00s system 40% cpu 0.009 total
xargs -I {} guetzli --quality 85 {} compressed/{}  35.74s user 0.68s system 99% cpu 36.560 total

And now the same but with machma:

▶ time find images -iname '*.jpg' | machma -- guetzli --quality 85 {} compressed/{}

processed 7 items (0 failures) in 0:10
find images -iname '*.jpg'  0.00s user 0.00s system 51% cpu 0.005 total
machma -- guetzli --quality 85 {} compressed/{}  58.47s user 0.91s system 546% cpu 10.857 total

Basically, it took only 11 seconds. This time there were fewer images (7) than there was CPUs (8), so basically the poor computer is doing super intensive CPU (and memory) work across all CPUs at the same time. The average time for each of these files is ~5 seconds so it's really interesting that even if you try to do this in parallel execution instead of taking a total of ~5 seconds, it took almost double that.

In conclusion

Such a handy tool to have around for command line stuff. I haven't looked at its code much but it's almost a shame that the project only has 300+ GitHub stars. Perhaps because it's kinda complete and doesn't need much more work.

Also, if you attempt all the examples above you'll notice that when you use the ... | xargs ... approach the stdout and stderr is a mess. For wget, that's why I used -q to silence it a bit. With machma you get a really pleasant color coded live output that tells you the state of the queue, possible failures and an ETA.

Comments

Your email will never ever be published.

Related posts