We use Poetry in a GitHub project. There's a pyproject.toml
file (and a poetry.lock
file) which with the help of the executable poetry
gets you a very reliable Python environment. The only problem is that adding the poetry
executable is slow. Like 10+ seconds slow. It might seem silly but in the project I'm working on, that 10+s delay is the slowest part of a GitHub Action workflow which needs to be fast because it's trying to post a comment on a pull request as soon as it possibly can.
First I tried caching $(pip cache dir)
so that the underlying python -v pip install virtualenv -t $tmp_dir
that install-poetry.py
does would get a boost from avoiding network. The difference was negligible. I also didn't want to get too weird by overriding how the install-poetry.py
works or even make my own hacky copy. I like being able to just rely on the snok/install-poetry
GitHub Action to do its thing (and its future thing).
The solution was to cache the whole $HOME/.local
directory. It's as simple as this:
- name: Load cached $HOME/.local
uses: actions/cache@v2.1.6
with:
path: ~/.local
key: dotlocal-${{ runner.os }}-${{ hashFiles('.github/workflows/pr-deployer.yml') }}
The key
is important. If you do copy-n-paste this block of YAML to speed up your GitHub Action, please remember to replace .github/workflows/pr-deployer.yml
with the name of your .yml
file that uses this. It's important because otherwise, the cache might be overzealously hot when you make a change like:
- name: Install Python poetry
- uses: snok/install-poetry@v1.1.6
+ uses: snok/install-poetry@v1.1.7
with:
...for example.
Now, thankfully install-poetry.py
(which is the recommended way to install poetry
by the way) can notice that it's already been created and so it can omit a bunch of work. The result of this is as follows:
From 10+ seconds to 2 seconds. And what's neat is that the optimization is very "unintrusive" because it doesn't mess with how the snok/install-poetry
workflow works.
But wait, there's more!
If you dig up our code where we use poetry
you might find that it does a bunch of other caching too. In particular, it caches .venv
it creates too. That's relevant but ultimately unrelated. It basically caches the generated virtualenv
from the poetry install
command. It works like this:
- name: Load cached venv
id: cached-poetry-dependencies
uses: actions/cache@v2.1.6
with:
path: deployer/.venv
key: venv-${{ runner.os }}-${{ hashFiles('**/poetry.lock') }}-${{ hashFiles('.github/workflows/pr-deployer.yml') }}
...
- name: Install deployer
run: |
cd deployer
poetry install
if: steps.cached-poetry-dependencies.outputs.cache-hit != 'true'
In this example, deployer
is just the name of the directory, in the repository root, where we have all the Python code and the pyproject.toml
etc. If you have yours at the root of the project you can just do: run: poetry install
and in the caching step change it to: path: .venv
.
Now, you get a really powerful complete caching strategy. When the caches are hot (i.e. no changes to the .yml
, poetry.lock
, or pyproject.toml
files) you get the executable (so you can do poetry run ...
) and all its dependencies in roughly 2 seconds. That'll be hard to beat!
Comments