How I used Parcel to "manually" bundle CSS files in a Remix app

May 31, 2023
0 comments JavaScript

I recently switch from Nextjs to Remix for my personal website. One thing I struggled with was to have it merge individual .css files into one. So I solved it with the Parcel CLI. This blog post demonstrates how.

The problem

Note, first of all, this talks about the global CSS. You can and should still employ CSS Modules or something equivalent for CSS that is tied directly to a React component.

But global CSS has its place and purpose. The problem is that there's no convenient way to bundle multiple little .css files into one which you can then nest into routes in Remix.

The way you inject CSS into a Remix page is like this:


import highlight from "~/styles/highlight.css";
import blogpost from "~/styles/blogpost.css";

...


export function links() {
  return [
    { rel: "stylesheet", href: highlight },
    { rel: "stylesheet", href: blogpost },
}

And for the record, suppose you have a nested route that needs those, and another one you do:


import banner from "~/styles/banner.css";
import { links as rootLinks } from "./_index";

...


export function links() {
  return [
    ...rootLinks().filter((x) => !x.extra),
    { rel: "stylesheet", href: banner },
  ];
}

This will nicely pick up those source .css files, minify them and produce in the final HTML SSR output:


<link rel="stylesheet" href="/build/_assets/highlight-KI4AX52K.css"/>
<link rel="stylesheet" href="/build/_assets/blogpost-75V4EYTP.css"/>

Nice. Http2 is famously good at parallel downloads. But even that has its physical limits. Especially if you have many little .css files that make up all the CSS you need. Now you have multiple files that can get stuck on the network. Yes, you might be able to update 1 and keep caching the others if their fingerprint don't change, but this is likely to be rare.

Parcel to the rescue

I solved it by using the Parcel CLI. In package.json I have:

"parcel:build": "parcel build --dist-dir app/styles/build app/*.css",

And in app/global.css I have this:


/* This is app/global.css */

@import "../node_modules/@picocss/pico/css/pico.css";
@import "./styles/globals.css";
@import "./styles/message.css";
@import "./styles/nav.css";
@import "./styles/comments.css";
@import "./styles/carbonads.css";
@import "./styles/carbonads-outer.css";
@import "./styles/modal-search.css";

That means, that Parcel will bundle all of these app/*.css files into 1 app/styles/build/global.css
Now, I can refer to that built on in the Remix app:


import global from "~/styles/build/global.css";

...

export function links() {
  return [
    { rel: "stylesheet", href: global },
  ]
}

Build vs. dev

Ok, so that explains how to bundle individual CSS files before you actually use the bundled CSS files. Remix doesn't care (a good thing).
At this point, we've modularized the problem. Now Parcel can do what it does best (CSS bundling (among other things it can do)) and Remix can do what it does (serving the .css files into the HTML).

But just like it's ergonomically pleasant to bundle CSS files like this, we still want it so that you don't have to manually run a separate step to build the bundle every time you edit an individual source .css file (e.g. app/styles/nav.css)

Here's how I solved that split up by Dev and Build

Build


  "scripts": {
-   "build": "remix build",
+   "build": "npm run parcel:build && remix build",
+   "parcel:build": "parcel build --dist-dir app/styles/build app/*.css",

Now, npm run build will do both things.

Dev


  "scripts": {
    "dev": "npm-run-all build --parallel \"dev:*\"",
    "dev:node": "cross-env NODE_ENV=development esrun --watch ./server.ts ",
    "dev:remix": "remix watch",
+   "dev:parcel": "parcel watch --dist-dir app/styles/build app/global.css",

In conclusion

I admit, I'm a CSS Modules fan-boy and it saddens me how much global CSS I have. One thing at a time, I guess. They both have their powers; global and modular CSS, but I'll admit that my own personal site still relies a bit too much on global CSS. At least, little goes to waste because Remix makes it relatively easy to pick exactly which files you need for individual routes.

Be careful with Date.toLocaleDateString() in JavaScript

May 8, 2023
4 comments Node, macOS, JavaScript

tl;dr; Always pass timeZone:"UTC" when calling Date.toLocaleDateString

The surprise

In my browser's web console:

>>> new Date('2014-11-27T02:50:49Z').toLocaleDateString("en-us", {day: "numeric"})
"26"

On my server located in the same time zone:

Welcome to Node.js v16.13.0.
Type ".help" for more information.
> process.env.TZ
undefined
> new Date('2014-11-27T02:50:49Z').toLocaleDateString("en-us", {day: "numeric"})
'26'

Here on my laptop:

Welcome to Node.js v16.20.0.
Type ".help" for more information.
> process.env.TZ
undefined
> new Date('2014-11-27T02:50:49Z').toLocaleDateString("en-us", {day: "numeric"})
'27'

What! Despite $TZ not being set, it formats according to something else.

02:50 Zulu means, to me, in the US Eastern time zone, the day before.

Why this matters

Web console server React errors
I kept getting this production error from React that the SSR-rendered HTML differed from the client-side rendered HTML. Strangely, I could never reproduce this locally and the error doesn't say what's different. All the Stack Overflow suggestions and Google results speak of the most basic easy things to check. It's not unusual that this happens when dealing with dates because even though the database (PostgreSQL) stores the dates in full UTC, sometimes when data travels via app servers through JSON pipelines, date formatting can drop important bits.
But here, '2014-11-27T02:50:49Z' is specific.

What made this so incredibly hard to debug was that it worked on one page but not on the other even though the two had the same exact component code. I broke it apart thinking there was something nasty in the content of the Markdown-rendered HTML. No. The reason it only happened on some pages was that I had a function that looked like this:


export function formatDateBasic(date: string) {
  return new Date(date).toLocaleDateString("en-us", {
    year: "numeric",
    month: "long",
    day: "numeric",
  });
}

And, different pages listed, almost non-deterministic, with different dates for related content which was referred to along with their dates. So on one page, there might be a single date that formats differently in EDT (Eastern daylight-saving time) compared to UTC. For example, Apr 1 at 18:00 Zulu, is still Apr 1 in EDT.

The explanation

I'm sorry that I don't understand this better, but Node's implementation of Date.toLocaleDateString does more than depend on process.env.TZ. I think $TZ is just a way to gain control.

For example, start the node REPL like this:

On my Ubuntu 20.04 server:

$ TZ=utc node
Welcome to Node.js v16.20.0.
Type ".help" for more information.
> new Date('2014-11-27T02:50:49Z').toLocaleDateString("en-us", {day: "numeric"})
'27'

On my MacBook:

❯ TZ=utc node
Welcome to Node.js v16.13.0.
Type ".help" for more information.
> new Date('2014-11-27T02:50:49Z').toLocaleDateString("en-us", {day: "numeric"})
'27'

To find out what timezone your computer has:

On Ubuntu:

$ timedatectl
               Local time: Mon 2023-05-08 12:42:03 UTC
           Universal time: Mon 2023-05-08 12:42:03 UTC
                 RTC time: Mon 2023-05-08 12:42:04
                Time zone: Etc/UTC (UTC, +0000)
System clock synchronized: yes
              NTP service: active
          RTC in local TZ: no

On macOS:

❯ sudo systemsetup -gettimezone
Password:
Time Zone: America/New_York

The solution

Setting TZ is probably a good thing. That can get a bit tricky though. Your code needs to run consistently on your laptop, in GitHub Actions, on a VPS server, in an Edge cloud function, etc.

A better way is to force Date.toLocaleString to be fed a timezone. Now it's controlled at the highest level:


export function formatDateBasic(date: string) {
  return new Date(date).toLocaleDateString("en-us", {
    year: "numeric",
    month: "long",
    day: "numeric",
+   timeZone: "UTC"
  });
}

Now, it no longer depends on the OS it runs on.

On my Ubuntu server:

Welcome to Node.js v16.20.0.
Type ".help" for more information.
> new Date('2014-11-27T02:50:49Z').toLocaleDateString("en-us", {day: "numeric", timeZone: "UTC"})
'27'

On my macOS:

Welcome to Node.js v16.13.0.
Type ".help" for more information.
> new Date('2014-11-27T02:50:49Z').toLocaleDateString("en-us", {day: "numeric", timeZone: "UTC"})
'27'

Fun fact

I once made it unnecessarily weird for me in the debugging session, when I figured out about the timeZone option. What I ran was this:

Welcome to Node.js v16.13.0.
Type ".help" for more information.
> new Date('2014-11-27T02:50:49Z').toLocaleDateString("en-us", {day: "numeric", zimeZone: "UTC"})
'26'

I expected it to be '27' now but why did it revert?? Notice the typo? And Date.toLocaleDateString won't throw an error for passing in options it doesn't expect.

How to run a GitHub Action workflow step if a file exists

April 24, 2023
2 comments GitHub

Suppose you have a GitHub Action workflow that does some computation and a possible outcome is that file comes into existence. How do you run a follow-up step based on whether a file was created?

tl;dr


- name: Is file created?
  if: ${{ hashFiles('test.txt') != '' }}
  run: echo "File exists"

The "wrong" way

Technically, there's no wrong way, but an alternative might be to rely on exit codes. This would work.


- name: Check if file was created
  run: |
      if [ -f test.txt ]; then
          echo "File exists"
          exit 1
      else
          echo "File does not exist"
      fi
- name: Did the last step fail?
  if: ${{ failure() }}
  run: echo "Last step failed, so file must have maybe been created"

The problem with this is that not only leaves a red ❌ in the workflow logs, but it could also lead to false positives. For example, if the step that might create a file is non-trivial, you don't want to lump the creation of the file with a possible bug in your code.

A use case

What I needed this for was a complex script that was executed to find broken links in a web app. If there were broken links, only then do I want to file a new issue about that. If the script failed for some reason, you want to know that and work on fixing whatever its bug might be. It looked like this:


  - name: Run broken link check
    run: |
      script/check-links.js broken_links.md

  - name: Create issue from file
    if: ${{ hashFiles('broken_links.md') != '' }}
    uses: peter-evans/create-issue-from-file@433e51abf769039ee20ba1293a088ca19d573b7f
    with:
      token: ${{ env.GITHUB_TOKEN }}
      title: More than one zero broken links found
      content-filepath: ./broken_links.md
      repository: ${{ env.REPORT_REPOSITORY }}
      labels: ${{ env.REPORT_LABEL }}

That script/check-links.js script is given an argument which is the name of the file to write to if it did indeed find any broken links. If there were any, it generates a snippet of Markdown about them which is the body of filed new issue.

Demo

To be confident this works, I created a dummy workflow in a test repo to test. It looks like this: .github/workflows/maybe-fail.yml

Automatically 'npm install'

April 6, 2023
0 comments Node, JavaScript

I implemented this at work recently and although it felt like a hack, I've come to like it and it's been very helpful to our many contributors.
As (Node) engineers, we know that you should keep your node_modules up-to-date by running npm install periodically or every time you git pull from the upstream. It could be that some package got upgraded last night since you git pulled last time.
But not everyone remembers to run npm install often enough. They might do git pull origin main && npm start and now the code that starts up depends on some latest version that was upgraded in package.json and package-lock.json.

How we solved it was that we added this script:


node script/cmp-files.js package-lock.json .installed.package-lock.json || npm install && cp package-lock.json .installed.package-lock.json

And it's hooked up as a script in package.json called prestart:

"scripts": {
  ...
  "prestart": "node script/cmp-files.js ...",
  ...
}

Now, every time you run npm start to start up the local development server, it will run that piece of bash. No more having to remember to run npm install after every git pull.

A note on performance

The npm install command is fast when all packages are already updated. You can see it with:


# First time
$ npm install

# Second time when nothing should happen
$ time npm install
...
2.53s user 0.37s system 134% cpu 2.166 total

So it only takes 2 seconds. Not bad.


$ time node script/cmp-files.js package-lock.json .installed.package-lock.json
...
0.08s user 0.03s system 100% cpu 0.110 total

But 0.08 seconds is better :)

The comparison script

The cmp-files.js script looks like this:


#!/usr/bin/env node

// Given N files. Exit 0 if they all exist and are identical in content.

import fs from 'fs'

import { program } from 'commander'

program.description('Compare N files').arguments('[files...]', '').parse(process.argv)

main(program.args)

function main(files) {
  if (files.length < 2) throw new Error('Must be at least 2 files')
  try {
    const contents = files.map((file) => fs.readFileSync(file, 'utf-8'))
    if (new Set(contents).size > 1) {
      process.exit(1)
    }
  } catch (error) {
    if (error.code === 'ENOENT') {
      process.exit(1)
    } else {
      throw error
    }
  }
}

The file .installed.package-lock.json file is added to the repo's .gitignore

Note; given how well this works for running before npm start we can probably add this to a post-checkout git hook too.

Benchmarking npm install with or without audit

February 23, 2023
1 comment Node, JavaScript

By default, running npm install will do a security audit of your installed packages. That audit is fast but it still takes a bit of time. To disable it you can either add --no-audit or you can...:

cat .npmrc
audit=false

But how much does the audit take when running npm install? To find out, I wrote this:


import random
import statistics
import subprocess
import time
from collections import defaultdict


def f1():
    subprocess.check_output("npm install".split())


def f2():
    subprocess.check_output("npm install --no-audit".split())


functions = f1, f2

times = defaultdict(list)
for i in range(25):
    f = random.choice(functions)

    t0 = time.time()
    f()
    t1 = time.time()
    times[f.__name__].append(t1 - t0)
    time.sleep(5)


for f_name in sorted(times.keys()):
    print(
        f_name,
        f"mean: {statistics.mean(times[f_name]):.1f}s".ljust(10),
        f"median: {statistics.median(times[f_name]):.1f}s",
    )

Note how it runs a lot of times in case there are network hiccups and it sleeps between each run just to spread out the experiment over a longer period of time. And the results are:

f1 mean: 2.81s median: 2.57s
f2 mean: 2.25s median: 2.21s

Going by the median time, the --no-audit makes the npm install 16% faster. If you look at the mean time dropping the --no-audit can make it 25% faster.

How to intercept and react to non-zero exits in bash

February 23, 2023
2 comments Bash, GitHub

Inside a step in a GitHub Action, I want to run a script, and depending on the outcome of that, maybe do some more things. Essentially, if the script fails, I want to print some extra user-friendly messages, but the whole Action should still fail with the same exit code.

In pseudo-code, this is what I want to achieve:

exit_code = that_other_script()
if exit_code > 0:
    print("Extra message if it failed")
exit(exit_code)

So here's how to do that with bash:


# If it's not the default, make it so that it proceeds even if 
# any one line exits non-zero
set +e

./script/update-internal-links.js --check
exit_code=$?

if [ $exit_code != 0 ]; then
  echo "Extra message here informing that the script failed"
  exit $exit_code
fi

The origin, for me, at the moment, was that I had a GitHub Action where it calls another script that might fail. If it fails, I wanted to print out a verbose extra hint to whoever looks at the output. Steps in GitHub Action runs with set -e by default I think, meaning that if anything goes wrong in the step it leaves the step and runs those steps with if: ${{ failure() }} next.

The technology behind You Should Watch

January 28, 2023
0 comments You Should Watch, React, Firebase, JavaScript

I recently launched You Should Watch which is a mobile-friendly web app to have a to-watch list of movies and TV shows as well being able to quickly share the links if you want someone to "you should watch" it.

I'll be honest, much of the motivation of building that web app was to try a couple of newish technologies that I wanted to either improve on or just try for the first time. These are the interesting tech pillars that made it possible to launch this web app in what was maybe 20-30 hours of total time.

All the code for You Should Watch is here: https://github.com/peterbe/youshouldwatch-next

The Movie Database API

The cornerstone that made this app possible in the first place. The API is free for developers who don't intend to earn revenue on whatever project they build with it. More details in their FAQ.

The search functionality is important. The way it works is that you can do a "multisearch" which means it finds movies, TV shows, or people. Then, when you have each search result's id and media_type you can fetch a lot more information specifically. For example, that's how the page for a person displays things differently than the page for a movie.

Next.js and the new App dir

In Next.js 13 you have a choice between regular pages directory or an app directory where every page (which becomes a URL) has to be called page.tsx.

No judgment here. It was a bit cryptic to rewrap my brain on how this works. In particular, the head.tsx is now different from the page.tsx and since both, in server-side rendering, need some async data I have to duplicate the await getMediaData() instead of being able to fetch it once and share with drop-drilling or context.

Vercel deployment

Wow! This was the most pleasant experience I've experienced in years. So polished and so much "just works". You sign in, with your GitHub auth, click to select the GitHub repo (that has a next.config.js and package.json etc) and you're done. That's it! Now, not only does every merged PR automatically (and fast!) get deployed, but you also get a preview deployment for every PR (which I didn't use).

I'm still using the free hobby tier but god forbid this app gets serious traffic, I'd just bump it up to $20/month which is cheap. Besides, the app is almost entirely CDN cacheable so only the search XHR backend would linearly increase its load with traffic I think.

Well done Vercel!

Playwright and VS Code

Not the first time I used Playwright but it was nice to return and start afresh. It definitely has improved in developer experience.

Previously I used npx and the terminal to run tests, but this time I tried "Playwright Test for VSCode" which was just fantastic! There are some slightly annoying things in that I had to use the mouse cursor more than I'd hoped, but it genuinely helped me be productive. Playwright also has the ability to generate JS code based on me clicking around in a temporary incognito browser window. You do a couple of things in the browser then paste in the generated source code into tests/basics.spec.ts and do some manual tidying up. To run the debugger like that, one simply types pnpm dlx playwright codegen

pnpm

It seems hip and a lot of people seem to recommend it. Kinda like yarn was hip and often recommended over npm (me included!).

Sure it works and it installs things fast but is it noticeable? Not really. Perhaps it's 4 seconds when it would have been 5 seconds with npm. Apparently pnpm does clever symlinking to avoid a disk-heavy node_modules/ but does it really matter ...much?
It's still large:

du -sh node_modules
468M    node_modules

A disadvantage with pnpm is that GitHub Dependabot currently doesn't support it :(
An advantage with pnpm is that pnpm up -i --latest is great interactive CLI which works like yarn upgrade-interactive --latest

just

just is like make but written in Rust. Now I have a justfile in the root of the repo and can type shortcut commands like just dev or just emu[TAB] (to tab autocomplete).

In hindsight, my justfile ended up being just a list of pnpm run ... commands but the idea is that just would be for all and any command under one roof.

End of the day, it becomes a nifty little file of "recipes" of useful commands and you can easily string them together. For example just lint is the combination of typing pnpm run prettier:check and pnpm run tsc and pnpm run lint.

Pico.css

A gorgeously simple looking pure-CSS framework. Yes, it's very limited in components and I don't know how well it "tree shakes" but it's so small and so neat that it had everything I needed.

My favorite React component library is Mantine but I definitely love the piece of mind that Pico.css is just CSS so you're not A) stuck with React forever, and B) not unnecessary JS code that slows things down.

Firebase

Good old Firebase. The bestest and easiest way to get a reliable and realtime database that is dirt cheap, simple, and has great documentation. I do regret not trying Supabase but I knew that getting the OAuth stuff to work with Google on a custom domain would be tricky so I stayed with Firebase.

react-lite-youtube-embed

A port of Paul Irish's Lite YouTube Embed which makes it easy to display YouTube thumbnails in a web performant way. All you have to do is:


import LiteYouTubeEmbed from "react-lite-youtube-embed";

<LiteYouTubeEmbed
   id={youtubeVideo.id}
   title={youtubeVideo.title} />

In conclusion

It's amazing how much time these tools saved compared to just years ago. I could build a fully working side-project with automation and high quality entirely thanks to great open source or well-tuned proprietary components, in just about one day if you sum up the hours.

Announcing: You Should Watch

January 27, 2023
3 comments You Should Watch

tl;dr You Should Watch is a mobile-friendly web app for remembering and sharing movies and TV shows you should watch. Like a to-do list to open when you finally sit down with the remote in your hand.

I built this in the holidays in some spare afternoons and evenings around the 2022 new years. The idea wasn't new in my head because it's an actual itch I've often had: what on earth should I watch now tonight?! Oftentimes you read or hear about great movies or TV shows but a lot of those memories are long gone by the time the kids have been put to sleep and you have control of the remote.

It's not rocket science. It's a neat little app that works great on your phone browser but also works as a website on your computer. You don't have to sign in but if you do, your list outlives the memory of your device. I.e. your selections get saved in the cloud and you can pick back up whenever you're on a different device. If you do decide to sign in, it currently uses Google for that. Your list is anonymized and even I, the owner of the database, can't tell which movies and TV shows you select.

If you do sign in to save your list, every time you check a movie or TV show off it goes into your archive. That can be useful for your next dinner party or date or cookout when you're scrambling to answer "Seen any good shows recently?".

One feature that I've personally enjoyed is the list of recommendations that each TV show or movie has. It's not a perfect list but it's fun and useful. Suppose you can't even think of what to watch and just want inspiration, start off by finding a movie like (but don't want to watch right now), then click on it and scroll down to the "Recommendations". Even if your next movie or TV show isn't on that list, perhaps clicking on something similar will take you to the next page where the right inspiration might be found. Try it.

It's free and will always be free. It was fun to build and thankfully free to run so it won't go away. I hope you enjoy it and get value from it. Please please share your thoughts and constructive criticism.

Try it on https://ushdwat.ch/

Some pictures

Your Watch list (mobile)

home page (mobile)

Search results

Add/Find

Navigating by "Recommendations"

Related information

"Add to Home Screen" on iOS Safari

Add to Home Screen

First impressions of Meilisearch and how it compares to Elasticsearch

January 26, 2023
1 comment Elasticsearch

tl;dr Meilisearch is like Elasticsearch but simpler. Decent parity in functionality and performance, but definitely intriguing if you don't already know Elasticsearch or want to run with fewer resources.

Meilisearch is a full-text search solution that you can use to power a really good site-search solution. My personal blog uses Elasticsearch but I wanted to experiment with switching to Meilisearch. This blog post is about some impressions based on this experiment.

Here are some of my observations:

Memory usage

When I start Elasticsearch and index all my blog posts and all comments, on the Activity Monitor that java process uses 1.3GB. The meilisearch process peaks at 290MB.

Indexing performance

In my case, it doesn't matter. When you constantly update the index (Elasticsearch) the time is unimportantly small when the dataset is small.
If you do a mass-reindexing you do that, in Elasticsearch, by creating a new index with a timestamp (e.g. blogpost-20230126134512) and then swap the alias with which you send your search queries. In that strategy, it doesn't matter how many seconds it takes because nobody's waiting for it to finish fast.

At the moment I don't even know how to append more to a Meilisearch index. I only know how to index everything all at once.

Note with the Elasticseach SDK (in Python) you can pass a generator to the parallel_bulk helper function meaning you can kinda "stream" in all the documents without loading them all into memory. I.e. I can't do queued = index.add_documents(get_docs_iterator()) so I have to instead do queued = index.add_documents(list(get_docs_iterator())).

Complex relevancy is easier, but more magic

Ranking search results by "matchness" is easy. I.e. when the search terms are "more present" it's likely to be more relevant. Elasticsearch sorts by _score by default. So does Meilisearch. In reality, you want to control that more. In my use case, I want the ranking to be a function of "matchness" and popularity (which is a number I derive from pageview metrics). In Elasticsearch you have the power writing a "Function score query" which gives you lots of flexibility to control exactly how. E.g. multiply or sum or an average or combination where you write a log function.

With Meilisearch you can only control the sort order of relevancy "algorithms". An example is:


[
  "words",
+ "popularity:desc",
  "typo",
  "proximity",
  "attribute",
  "sort",
  "exactness"
]

I can't exactly phrase myself how I'd exactly prefer it but it feels a bit like magic. The lack of functionality also speaks to a strength of Meilisearch in that it's easy to get something to incorporate it.

Highlighting is easy

What I want is that the title to be highlighted. In full. For the text I'd rather have snippets that focus where the highlights appear within the text. This was a breeze! Here's how you do it:


res = client.index("blogitems").search(
    q,
    {
        "attributesToHighlight": ["title", "text"],
        "highlightPreTag": "<mark>",
        "highlightPostTag": "</mark>",
        "attributesToCrop": ["text"],
        "cropLength": 30,
    },
)

Relevancy in between fields is too basic

In Elasticsearch you use a boost multiplier to specify, that the title is more important than the body. For example:


from elasticsearch_dsl import Q

...

title_match = Q("match", title={"query": word, "boost": 3.5}) 
body_match = Q("match", body={"query": word, "boost": 1.0})
match = title_match | body_match

That means you can express that the title is 3.5 times more important than the match on body.

With Meilisearch you have no such functionality (as far as I can see) but you can say that title > body and the way you do that is a bit odd. It's the order with which you define the searchable fields.

Search performance

My dataset is small. About 2,000 documents. I wasn't able to measure a significant difference. In both my search query implementations the time it takes is between 10 and 30 milliseconds. Both are fast enough. The time that matters is the networking overheads. The networking to and fro the database probably matters more but if the network is localhost the search time is irrelevant.

In conclusion

When you're already comfortable and versed in the more powerful beast that is Elasticsearch, it's less relevant. However, Meilisearch feels like a nicer experience in its simplicity if you're confronted with a choice on your next full-text search project.

You could say that in terms of core search functionality, to me, Meilisearch sits between PostgreSQL's full-text and Elasticsearch.

What often matters more, if the project is a team effort that involves many people that might outlast you, the operational side matters more. I.e. do you install it yourself or do you use a proprietary cloud provider (which both Elastic and Meilisearch Cloud are) then that's what needs to be more carefully considered? It's good to know though that Meilisearch has most of the core functionality, including great documentation, to build something really great.

Pip-Outdated.py - a script to compare requirements.in with the output of pip list --outdated

December 22, 2022
0 comments Python

Simply by posting this, there's a big chance you'll say "Hey! Didn't you know there's already a well-known script that does this? Better." Or you'll say "Hey! That'll save me hundreds of seconds per year!"

The problem

Suppose you have a requirements.in file that is used, by pip-compile to generate the requirements.txt that you actually install in your Dockerfile or whatever server deployment. The requirements.in is meant to be the human-readable file and the requirements.txt is for the computers. You manually edit the version numbers in the requirements.in and then run pip-compile --generate-hashes requirements.in to generate a new requirements.txt. But the "first-class" packages in the requirements.in aren't the only packages that get installed. For example:

▶ cat requirements.in | rg '==' | wc -l
      54

▶ cat requirements.txt | rg '==' | wc -l
     102

In other words, in this particular example, there are 76 "second-class" packages that get installed. There might actually be more stuff installed that you didn't describe. That's why pip list | wc -l can be even higher. For example, you might have locally and manually done pip install ipython for a nicer interactive prompt.

The solution

The command pip list --outdated will list packages based on the requirements.txt not the requirements.in. To mitigate that, I wrote a quick Python CLI script that combines the output of pip list --outdated with the packages mentioned in requirements.in:


#!/usr/bin/env python

import subprocess


def main(*args):
    if not args:
        requirements_in = "requirements.in"
    else:
        requirements_in = args[0]
    required = {}
    with open(requirements_in) as f:
        for line in f:
            if "==" in line:
                package, version = line.strip().split("==")
                package = package.split("[")[0]
                required[package] = version

    res = subprocess.run(["pip", "list", "--outdated"], capture_output=True)
    if res.returncode:
        raise Exception(res.stderr)

    lines = res.stdout.decode("utf-8").splitlines()
    relevant = [line for line in lines if line.split()[0] in required]

    longest_package_name = max([len(x.split()[0]) for x in relevant]) if relevant else 0

    for line in relevant:
        p, installed, possible, *_ = line.split()
        if p in required:
            print(
                p.ljust(longest_package_name + 2),
                "INSTALLED:",
                installed.ljust(9),
                "POSSIBLE:",
                possible,
            )


if __name__ == "__main__":
    import sys

    sys.exit(main(*sys.argv[1:]))

Installation

To install this, you can just download the script and run it in any directory that contains a requirements.in file.

Or you can install it like this:

curl -L https://gist.github.com/peterbe/099ad364657b70a04b1d65aa29087df7/raw/23fb1963b35a2559a8b24058a0a014893c4e7199/Pip-Outdated.py > ~/bin/Pip-Outdated.py
chmod +x ~/bin/Pip-Outdated.py

Pip-Outdated.py