Last week, I landed concurrent downloads in hashin
. The example was that you do something like...
$ time hashin -r some/requirements.txt --update-all
...and the whole thing takes ~2 seconds even though it that some/requirements.txt
file might contain 50 different packages, and thus 50 different PyPI.org lookups.
Just wanted to point out, this is not unique to use with --update-all
. It's for any list of packages. And I want to put some better numbers on that so here goes...
Suppose you want to create a requirements file for every package in the current virtualenv you might do it like this:
# the -e filtering removes locally installed packages from git URLs
$ pip freeze | grep -v '-e ' | xargs hashin -r /tmp/reqs.txt
Before running that I injected a little timer on each pypi.org download. It looked like this:
def get_package_data(package, verbose=False):
url = "https://pypi.org/pypi/%s/json" % package
if verbose:
print(url)
+ t0 = time.time()
content = json.loads(_download(url))
if "releases" not in content:
raise PackageError("package JSON is not sane")
+ t1 = time.time()
+ print(t1 - t0)
I also put a print around the call to pre_download_packages(lookup_memory, specs, verbose=verbose)
to see what the "total time" was.
The output looked like this:
▶ pip freeze | grep -v '-e ' | xargs python hashin.py -r /tmp/reqs.txt 0.22896194458007812 0.2900810241699219 0.2814369201660156 0.22658205032348633 0.24882292747497559 0.268247127532959 0.29332590103149414 0.23981380462646484 0.2930259704589844 0.29442572593688965 0.25312376022338867 0.34232664108276367 0.49491214752197266 0.23823285102844238 0.3221290111541748 0.28302812576293945 0.567702054977417 0.3089122772216797 0.5273139476776123 0.31477880477905273 0.6202089786529541 0.28571176528930664 0.24558186531066895 0.5810830593109131 0.5219211578369141 0.23252081871032715 0.4650228023529053 0.6127192974090576 0.6000659465789795 0.30976200103759766 0.44440698623657227 0.3135409355163574 0.638585090637207 0.297544002532959 0.6462509632110596 0.45389699935913086 0.34597206115722656 0.3462028503417969 0.6250648498535156 0.44159507751464844 0.5733060836791992 0.6739277839660645 0.6560370922088623 SUM TOTAL TOOK 0.8481268882751465
If you sum up all the individual times it would have become 17.3 seconds. It's 43 individual packages and 8 CPUs multiplied by 5 means it had to wait with some before downloading the rest.
Clearly, this works nicely.
Comments