Filtered by JavaScript, Python

Page 5

Reset

How to change the current query string URL in NextJS v13 with next/navigation

December 9, 2022
4 comments React, JavaScript

At the time of writing, I don't know if this is the optimal way, but after some trial and error, I got it working.

This example demonstrates a hook that gives you the current value of the ?view=... (or a default) and a function you can call to change it so that ?view=before becomes ?view=after.

In NextJS v13 with the pages directory:


import { useRouter } from "next/router";

export function useNamesView() {
    const KEY = "view";
    const DEFAULT_NAMES_VIEW = "buttons";
    const router = useRouter();

    let namesView: Options = DEFAULT_NAMES_VIEW;
    const raw = router.query[KEY];
    const value = Array.isArray(raw) ? raw[0] : raw;
    if (value === "buttons" || value === "table") {
        namesView = value;
    }

    function setNamesView(value: Options) {
        const [asPathRoot, asPathQuery = ""] = router.asPath.split("?");
        const params = new URLSearchParams(asPathQuery);
        params.set(KEY, value);
        const asPath = `${asPathRoot}?${params.toString()}`;
        router.replace(asPath, asPath, { shallow: true });
    }

    return { namesView, setNamesView };
}

In NextJS v13 with the app directory.


import { useRouter, useSearchParams, usePathname } from "next/navigation";

type Options = "buttons" | "table";

export function useNamesView() {
    const KEY = "view";
    const DEFAULT_NAMES_VIEW = "buttons";
    const router = useRouter();
    const searchParams = useSearchParams();
    const pathname = usePathname();

    let namesView: Options = DEFAULT_NAMES_VIEW;
    const value = searchParams.get(KEY);
    if (value === "buttons" || value === "table") {
        namesView = value;
    }

    function setNamesView(value: Options) {
        const params = new URLSearchParams(searchParams);
        params.set(KEY, value);
        router.replace(`${pathname}?${params}`);
    }

    return { namesView, setNamesView };
}

The trick is that you only want to change 1 query string value and respect whatever was there before. So if the existing URL was /page?foo=bar and you want that to become /page?foo=bar&and=also you have to consume the existing query string and you do that with:


const searchParams = useSearchParams();
...
const params = new URLSearchParams(searchParams);
params.set('and', 'also')

How much faster is Cheerio at parsing depending on xmlMode?

December 5, 2022
0 comments Node, JavaScript

Cheerio is a fantastic Node library for parsing HTML and then being able to manipulate and serialize it. But you can also just use it for parsing HTML and plucking out what you need. We use that to prepare the text that goes into our search index for our site. It basically works like this:


const body = await getBody('http://localhost:4002' + eachPage.path)
const $ = cheerio.load(body)
const title = $('h1').text()
const intro = $('p.intro').text()
...

But it hit me, can we speed that up? cheerio actually ships with two different parsers:

  1. parse5
  2. htmlparser2

One is faster and one is more strict.
But I wanted to see this in a real-world example.

So I made two runs where I used:


const $ = cheerio.load(body)

in one run, and:


const $ = cheerio.load(body, { xmlMode: true })

in another.

After having parsed 1,635 pages of HTML of various sizes the results are:

FILE: load.txt
MEAN:   13.19457640586797
MEDIAN: 10.5975

FILE: load-xmlmode.txt
MEAN:   3.9020372860635697
MEDIAN: 3.1020000000000003

So, using {xmlMode:true} leads to roughly a 3x speedup.

I think it pretty much confirms the original benchmark, but now I know based on a real application.

First impressions trying out Rome to format/lint my TypeScript and JavaScript

November 14, 2022
1 comment Node, JavaScript

Rome is a new contender to compete with Prettier and eslint, combined. It's fast and its suggestions are much easier to understand.

I have a project that uses .js, .ts, and .tsx files. At first, I thought, I'd just use rome to do formatting but the linter part was feeling nice as I was experimenting so I thought I'd kill two birds with one stone.

Things that worked well

It is fast

My little project only has 28 files, but time rome check lib scripts components *.ts consistently takes 0.08 seconds.

The CLI looks great

You get this nice prompt after running npx rome init the first time:

rome init

Suggestions just look great

Easy to understand and needs no explanation because the suggested fix tells a story that means it's immediately easy to understand what the warning is trying to say.

suggestion

It is smaller

If I run npx create-next-app@latest, say yes to Eslint, and then run npm I -D prettier, the node_modules becomes 275.3 MiB.
Whereas if I run npx create-next-app@latest, say no to Eslint, and then run npm I -D rome, the node_modules becomes 200.4 MiB.

Editing the rome.json's JSON schema works in VS Code

I don't know how this magically worked, but I'm guessing it just does when you install the Rome VS Code extension. Neat with autocomplete!

editing the rome.json file

Things that didn't work so well

Almost all things that I'm going to "complain" about is down to usability. I might look back at this in a year (or tomorrow!) and laugh at myself for being dim, but it nevertheless was part of my experience so it's worth pointing out.

Lint, check, or format?

It's confusing what is what. If lint means checking without modifying, what is check then? I'm guessing rome format means run the lint but with permission to edit my files.

What is rome format compared to rome check --apply then??

I guess rome check --apply doesn't just complain but actually applies the things it spots. So what is rome check --apply-suggested?? (if you're reading this and feel eager to educate me with a comment, please do, but I'm trying to point out that it's not user-friendly)

How do I specify wildcards?

Unfortunately, in this project, not all files are in one single directory (e.g. rome check src/ is not an option). How do I specify a wildcard expression?


▶ rome check *.ts
Checked 3 files in 942µs

Cool, but how do I do all .ts files throughout the project?


▶ rome check "**/*.ts"
**/*.ts internalError/io ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

  ✖ No such file or directory (os error 2)


Checked 0 files in 66µs

Clearly, it's not this:


▶ rome check **/*.ts

...

The number of diagnostics exceeds the number allowed by Rome.
Diagnostics not shown: 1018.
Checked 2534 files in 1387ms
Skipped 1 files
Error: errors where emitted while running checks

...because bash will include all the files from node_modules/**/*.ts.

In the end, I ended up with this (in my package.json):

"scripts": {
    "code:lint": "rome check lib scripts components *.ts",
    ...

There's no documentation about how to ignore certain rules

Yes, I can contribute this back to the documentation, but today's not the day to do that.

It took me a long time to find out how to disable certain rules (in the rome.json file) and finally I landed on this:

{
  "linter": {
    "enabled": true,
    "rules": {
      "recommended": true,
      "style": {
        "recommended": true,
        "noImplicitBoolean": "off"
      },
      "a11y": {
        "useKeyWithClickEvents": "off",
        "useValidAnchor": "warn"
      }
    }
  }
}

Much better than having to write inline code comments with the source files themselves.

However, it's still not clear to me what "recommended": true means. Is it shorthand for listing all the default rules all set to true? If I remove that, are no rules activated?

The rome.json file is JSON

JSON is cool for many things, but writing comments is not one of them.

For example, I don't know what would be better, Yaml or Toml, but it would be nice to write something like:

"a11y": {
    # Disabled because of issue #1234
    # Consider putting this back in December after the refactor launch
    "useKeyWithClickEvents": "off",

Nextjs and rome needs to talk

When create-react-app first came onto the scene, the coolest thing was the zero-config webpack. But, if you remember, it also came with a really nice zero-config eslint configuration for React apps. It would even print warnings when the dev server was running. Now it's many years later and good linting config is something you depend/rely on in a framework. Like it or not, there are specific things in Nextjs that is exclusive to that framework. It's obviously not an easy people-problem to solve but it would be nice if Nextjs and rome could be best friends so you get all the good linting ideas from the code Nextjs framework but all done using rome instead.

Spot the JavaScript bug with recursion and incrementing

September 28, 2022
0 comments JavaScript

What will this print?


function doSomething(iterations = 0) {
  if (iterations < 10) {
    console.log("Let's do this again!")
    doSomething(iterations++)    
  }
}
doSomething()

The answer is it will print

Let's do this again!
Let's do this again!
Let's do this again!
Let's do this again!
Let's do this again!
Let's do this again!
Let's do this again!
Let's do this again!
...forever...

The bug is the use of a "postfix increment" which is a bug I had in some production code (almost, it never shipped).

The solution is simple:


     console.log("Let's do this again!")
-    doSomething(iterations++)
+    doSomething(++iterations)    

That's called "prefix increment" which means it not only changes the variable but returns what the value became rather than what it was before increment.

The beautiful solution is actually the simplest solution:


     console.log("Let's do this again!")
-    doSomething(iterations++)
+    doSomething(iterations + 1)    

Now, you don't even mutate the value of the iterations variable but create a new one for the recursion call.

All in all, pretty simple mistake but it can easily happen. Particular if you feel inclined to look cool by using the spiffy ++ shorthand because it looks neater or something.

Programmatically render a NextJS page without a server in Node

September 6, 2022
1 comment Web development, Node, JavaScript

If you use getServerSideProps() in Next you can render a page by visiting it. E.g. GET http://localhost:3000/mypages/page1
Or if you use getStaticProps() with getStaticPaths(), you can use npm run build to generate the HTML file (e.g. .next/server/pages directory).
But what if you don't want to start a server. What if you have a particular page/URL in mind that you want to generate but without starting a server and sending an HTTP GET request to it? This blog post shows a way to do this with a plain Node script.

Here's a solution to programmatically render a page:


#!/usr/bin/env node

import http from "http";

import next from "next";

async function main(uris) {
  const nextApp = next({});
  const nextHandleRequest = nextApp.getRequestHandler();
  await nextApp.prepare();

  const htmls = Object.fromEntries(
    await Promise.all(
      uris.map((uri) => {
        try {
          // If it's a fully qualified URL, make it its pathname
          uri = new URL(uri).pathname;
        } catch {}
        return renderPage(nextHandleRequest, uri);
      })
    )
  );
  console.log(htmls);
}

async function renderPage(handler, url) {
  const req = new http.IncomingMessage(null);
  const res = new http.ServerResponse(req);
  req.method = "GET";
  req.url = url;
  req.path = url;
  req.cookies = {};
  req.headers = {};
  await handler(req, res);
  if (res.statusCode !== 200) {
    throw new Error(`${res.statusCode} on rendering ${req.url}`);
  }
  for (const { data } of res.outputData) {
    const [, body] = data.split("\r\n\r\n");
    if (body) return [url, body];
  }
  throw new Error("No output data has a body");
}

main(process.argv.slice(2)).catch((err) => {
  console.error(err);
  process.exit(1);
});

To demonstrate I created this sample repo: https://github.com/peterbe/programmatically-render-next-page

Note, that you need to run npm run build first so Next can have all the static assets ready.

In conclusion

The alternative, in automation, would be run something like this:


▶ npm run build && npm run start &
▶ sleep 5  # give the server a chance to start
▶ xh http://localhost:3000/aboutus
HTTP/1.1 200 OK
Connection: keep-alive
Content-Encoding: gzip
Content-Type: text/html; charset=utf-8
Date: Tue, 06 Sep 2022 12:23:42 GMT
Etag: "m8ff9sdduo1hk"
Keep-Alive: timeout=5
Transfer-Encoding: chunked
Vary: Accept-Encoding
X-Powered-By: Next.js

<!DOCTYPE html><html><head><meta charSet="utf-8"/><meta name="viewport" content="width=device-width"/><title>About Us page</title><meta name="description" content="We do things. I hope."/><link rel="icon" href="/favicon.ico"/><meta name="next-head-count" content="5"/><link rel="preload" href="/_next/static/css/ab44ce7add5c3d11.css" as="style"/><link rel="stylesheet" href="/_next/static/css/ab44ce7add5c3d11.css" data-n-g=""/><link rel="preload" href="/_next/static/css/ae0e3e027412e072.css" as="style"/><link rel="stylesheet" href="/_next/static/css/ae0e3e027412e072.css" data-n-p=""/><noscript data-n-css=""></noscript><script defer="" nomodule="" src="/_next/static/chunks/polyfills-c67a75d1b6f99dc8.js"></script><script src="/_next/static/chunks/webpack-7ee66019f7f6d30f.js" defer=""></script><script src="/_next/static/chunks/framework-db825bd0b4ae01ef.js" defer=""></script><script src="/_next/static/chunks/main-3123a443c688934f.js" defer=""></script><script src="/_next/static/chunks/pages/_app-deb173bd80cbaa92.js" defer=""></script><script src="/_next/static/chunks/996-f1475101e84cf548.js" defer=""></script><script src="/_next/static/chunks/pages/aboutus-41b1f037d974ef60.js" defer=""></script><script src="/_next/static/REJUWXI26y-lp9JVmzJB5/_buildManifest.js" defer=""></script><script src="/_next/static/REJUWXI26y-lp9JVmzJB5/_ssgManifest.js" defer=""></script></head><body><div id="__next"><div class="Home_container__bCOhY"><main class="Home_main__nLjiQ"><h1 class="Home_title__T09hD">About Use page</h1><p class="Home_description__41Owk"><a href="/">Go to the <b>Home</b> page</a></p></main><footer class="Home_footer____T7K"><a href="/">Home page</a></footer></div></div><script id="__NEXT_DATA__" type="application/json">{"props":{"pageProps":{}},"page":"/aboutus","query":{},"buildId":"REJUWXI26y-lp9JVmzJB5","nextExport":true,"autoExport":true,"isFallback":false,"scriptLoader":[]}</script></body></html>

There are probably many great ideas that this can be used for. At work we use getServerSideProps() and we have too many pages to build them all statically. We need a solution like this to do custom analysis of the rendered HTML to check for broken links by analyzing every generated <a href> tag.

Join a list with a bitwise or operator in Python

August 22, 2022
0 comments Python

The bitwise OR operator in Python is often convenient when you want to combine multiple things into one thing. For example, with the Django ORM you might do this:


from django.db.models import Q

filter_ = Q(first_name__icontains="peter") | Q(first_name__icontains="ashley")

for contact in Contact.objects.filter(filter_):
    print((contact.first_name, contact.last_name))

See how it hardcodes the filtering on strings peter and ashley.
But what if that was a bit more complicated:


from django.db.models import Q

filter_ = Q(first_name__icontains="peter")
if include("ashley"):
    filter_ | = Q(first_name__icontains="ashley")

for contact in Contact.objects.filter(filter_):
    print((contact.first_name, contact.last_name))

So far, same functionality.

But what if the business logic is more complicated? You can't do this:


filter_ = None
if include("peter"):
    filter_ | = Q(first_name__icontains="peter")  # WILL NOT WORK
if include("ashley"):
    filter_ | = Q(first_name__icontains="ashley")

for contact in Contact.objects.filter(filter_):
    print((contact.first_name, contact.last_name))

What if the list of things you want to filter on depends on a list? You'd need to do the |= stuff "dynamically". One way to solve that is with functools.reduce. Suppose the list of things you want to bitwise-OR together is a list:


from django.db.models import Q
from operator import or_
from functools import reduce


def include(_):
    import random
    return random.random() > 0.5

filters = []
if include("peter"):
    filters.append(Q(first_name__icontains="peter"))
if include("ashley"):
    filters.append(Q(first_name__icontains="ashley"))

assert len(filters), "must have at least one filter"
filter_ = reduce(or_, filters)  # THE MAGIC!

for contact in Contact.objects.filter(filter_):
    print((contact.first_name, contact.last_name))

And finally, if it's a list already:


from django.db.models import Q
from operator import or_
from functools import reduce

names = ["peter", "ashley"]
qs = [Q(first_name__icontains=x) for x in names]
filter_ = reduce(or_, qs)

for contact in Contact.objects.filter(filter_):
    print((contact.first_name, contact.last_name))

Side note

Django's django.db.models.Q is actually quite flexible with used with MyModel.objects.filter(...) because this actually works:


from django.db.models import Q

def include(_):
    import random
    return random.random() > 0.5

filter_ = Q()  # MAGIC SAUCE
if include("peter"):
    filter_ |= Q(first_name__icontains="peter")
if include("ashley"):
    filter_ |= Q(first_name__icontains="ashley")

for contact in Contact.objects.filter(filter_):
    print((contact.first_name, contact.last_name))

How to sort case insensitively with empty strings last in Django

April 3, 2022
1 comment Django, Python, PostgreSQL

Imagine you have something like this in Django:


class MyModel(models.Models):
    last_name = models.CharField(max_length=255, blank=True)
    ...

The most basic sorting is either: queryset.order_by('last_name') or queryset.order_by('-last_name'). But what if you want entries with a blank string last? And, you want it to be case insensitive. Here's how you do it:


from django.db.models.functions import Lower, NullIf
from django.db.models import Value


if reverse:
    order_by = Lower("last_name").desc()
else:
    order_by = Lower(NullIf("last_name", Value("")), nulls_last=True)


ALL = list(queryset.values_list("last_name", flat=True))
print("FIRST 5:", ALL[:5])
# Will print either...
#   FIRST 5: ['Zuniga', 'Zukauskas', 'Zuccala', 'Zoller', 'ZM']
# or 
#   FIRST 5: ['A', 'aaa', 'Abrams', 'Abro', 'Absher']
print("LAST 5:", ALL[-5:])
# Will print...
#   LAST 5: ['', '', '', '', '']

This is only tested with PostgreSQL but it works nicely.
If you're curious about what the SQL becomes, it's:


SELECT "main_contact"."last_name" FROM "main_contact" 
ORDER BY LOWER(NULLIF("main_contact"."last_name", '')) ASC

or


SELECT "main_contact"."last_name" FROM "main_contact" 
ORDER BY LOWER("main_contact"."last_name") DESC

Note that if your table columns is either a string, an empty string, or null, the reverse needs to be: Lower("last_name", nulls_last=True).desc().

How to close a HTTP GET request in Python before the end

March 30, 2022
0 comments Python

Does you server barf if your clients close the connection before it's fully downloaded? Well, there's an easy way to find out. You can use this Python script:


import sys
import requests

url = sys.argv[1]
assert '://' in url, url
r = requests.get(url, stream=True)
if r.encoding is None:
    r.encoding = 'utf-8'
for chunk in r.iter_content(1024, decode_unicode=True):
    break

I use the xh CLI tool a lot. It's like curl but better in some things. By default, if you use --headers it will make a regular GET request but close the connection as soon as it has gotten all the headers. E.g.

▶ xh --headers https://www.peterbe.com
HTTP/2.0 200 OK
cache-control: public,max-age=3600
content-type: text/html; charset=utf-8
date: Wed, 30 Mar 2022 12:37:09 GMT
etag: "3f336-Rohm58s5+atf5Qvr04kmrx44iFs"
server: keycdn-engine
strict-transport-security: max-age=63072000; includeSubdomains; preload
vary: Accept-Encoding
x-cache: HIT
x-content-type-options: nosniff
x-edge-location: usat
x-frame-options: SAMEORIGIN
x-middleware-cache: hit
x-powered-by: Express
x-shield: active
x-xss-protection: 1; mode=block

That's not be confused with doing HEAD like curl -I ....

So either with xh or the Python script above, you can get that same effect. It's a useful trick when you want to make sure your (async) server doesn't attempt to do weird stuff with the "Response" object after the connection has closed.

Introducing docsQL

March 28, 2022
0 comments Web development, GitHub, JavaScript

tl;dr; docsQL is a web app for analyzing lots of Markdown content files with SQL queries.

Demo

Sample instance based on MDN's open source content.

Screenshots

Background/Introduction

When I worked on the code for MDN in 2019-2021 I often found that I needed to understand the content better to debug or test or just find a sample page that uses some feature. I ended up writing a lot of one-off Python scripts that would traverse the repository files just to do some quick lookup that was too complex for grep. Eventually, I built a prototype called "Traits DB" which was powered by an in-browser SQL engine called alasql. Then in 2021, I joined GitHub to work on GitHub Docs and here there are lots of Markdown files too that trigger different features based on various front-matter keys.

docsQL does two things:

  1. Analyze lots of .md files into a docs.json file which can be queried
  2. A static single-page-app for executing SQL against this database file

Plugins

The analyzing portion has a killer feature in that you can write your own plugins tailored specifically to your project. Your project might use some quirks that are unique. In GitHub Docs, for example, we use something called "LiquidJS" which is like a pre-Markdown processing to do things like versioning. So I can write a custom JavaScript plugin that extends data you get from reading in the front-matter.

Here's an example plugin:


const regex = /💩/g;
export default function countCocoIceMentions({ data, content }) {
  const inTitle = (data.title.match(regex) || []).length;
  const inBody = (content.match(regex) || []).length;
  return {
    chocolateIcecreamMentions: inTitle + inBody,
  };
}

Now, if you add that to your project, you'll be able to run:


SELECT title, chocolateIcecreamMentions FROM ? 
WHERE chocolateIcecreamMentions > 0 
ORDER BY 2 DESC LIMIT 15

How you're supposed to use it

It's up to you. One important fact to keep in mind is that not everyone speaks SQL fluently. And even if you're somewhat confident with SQL, it might not be obvious how this particular engine works or what the fields are. (Mind you, there's a "Help" which shows you all fields and a collection of sample queries).
But it's really intuitive to extend an already written SQL query. So if someone shares their query, it's easy to just extend it. For example, your colleague might share a URL with an SQL query in the query string, but you want to change the sort order so you just edit DESC for ASC.

I would recommend that any team that has a project with a bunch of Markdown files, add docsql as a dependency somewhere, have it build with your directory of Markdown files, and then publish the docsql/out/ directory as a static web page which you can host on Netlify or GitHub Pages.
This way, your team gets a centralized place where team members can share URLs with each other that has queries in it. When someone shares one of these, they get added to your "Saved queries" and you can extend them from there to add to your own list.

Behind the scenes

The project is here: github.com/peterbe/docsql and it's MIT licensed. The analyzing part is all Node. It's a CLI that is able to dynamically import other .mjs files based on scanning the directory at runtime.

The front-end is a NextJS static build which uses Mantine for the React UI components.

You can install it npx like this:


npx docsql /path/to/my/markdown/files

But if you want to control it a bit better you can simply add it to your own Node project with: npm save docsql or yarn add docsql.

Let's make it better

First of all, it's a very new project. My initial goal was to get the basics working. A lot of edges have been left rough. Especially in areas of installation, performance, and SQL editor. Please come and help out if you see something. In particular, if you tried to set it up but found it hard, we can work together to either improve the documentation to fix some scripts that would help the next person.

For feature requests and bug reports use: https://github.com/peterbe/docsql/issues/new
Or just comment here on the blog post.

Make your NextJS site 10-100x faster with Express caching

February 18, 2022
0 comments React, Node, Nginx, JavaScript

UPDATE: Feb 21, 2022: The original blog post didn't mention the caching of custom headers. So warm cache hits would lose Cache-Control from the cold cache misses. Code updated below.

I know I know. The title sounds ridiculous. But it's not untrue. I managed to make my NextJS 20x faster by allowing the Express server, which handles NextJS, to cache the output in memory. And cache invalidation is not a problem.

Layers

My personal blog is a stack of layers:

KeyCDN --> Nginx (on my server) -> Express (same server) -> NextJS (inside Express)

And inside the NextJS code, to get the actual data, it uses HTTP to talk to a local Django server to get JSON based on data stored in a PostgreSQL database.

The problems I have are as follows:

  • The CDN sometimes asks for the same URL more than once when in theory you'd think it should be cached by them for a week. And if the traffic is high, my backend might get a stamping herd of requests until the CDN has warmed up.
  • It's technically possible to bypass the CDN by going straight to the origin server.
  • NextJS is "slow" and the culprit is actually critters which computes the critical CSS inline and lazy-loads the rest.
  • Using Nginx to do in-memory caching (which is powerfully fast by the way) does not allow cache purging at all (unless you buy Nginx Plus)

I really like NextJS and it's a great developer experience. There are definitely many things I don't like about it, but that's more because my site isn't SPA'y enough to benefit from much of what NextJS has to offer. By the way, I blogged about rewriting my site in NextJS last year.

Quick detour about critters

If you're reading my blog right now in a desktop browser, right-click and view source and you'll find this:


<head>
  <style>
  *,:after,:before{box-sizing:inherit}html{box-sizing:border-box}inpu...
  ... about 19k of inline CSS...
  </style>
  <link rel="stylesheet" href="/_next/static/css/fdcd47c7ff7e10df.css" data-n-g="" media="print" onload="this.media='all'">
  <noscript><link rel="stylesheet" href="/_next/static/css/fdcd47c7ff7e10df.css"></noscript>  
  ...
</head>

It's great for web performance because a <link rel="stylesheet" href="css.css"> is a render-blocking thing and it makes the site feel slow on first load. I wish I didn't need this, but it comes from my lack of CSS styling skills to custom hand-code every bit of CSS and instead, I rely on a bloated CSS framework which comes as a massive kitchen sink.

To add critical CSS optimization in NextJS, you add:


experimental: { optimizeCss: true },

inside your next.config.js. Easy enough, but it slows down my site by a factor of ~80ms to ~230ms on my Intel Macbook per page rendered.
So see, if it wasn't for this need of critical CSS inlining, NextJS would be about ~80ms per page and that includes getting all the data via HTTP JSON for each page too.

Express caching middleware

My server.mjs looks like this (simplified):


import next from "next";

import renderCaching from "./middleware/render-caching.mjs";

const app = next({ dev });
const handle = app.getRequestHandler();

app
  .prepare()
  .then(() => {
    const server = express();

    // For Gzip and Brotli compression
    server.use(shrinkRay());

    server.use(renderCaching);

    server.use(handle);

    // Use the rollbar error handler to send exceptions to your rollbar account
    if (rollbar) server.use(rollbar.errorHandler());

    server.listen(port, (err) => {
      if (err) throw err;
      console.log(`> Ready on http://localhost:${port}`);
    });
  })

And the middleware/render-caching.mjs looks like this:


import express from "express";
import QuickLRU from "quick-lru";

const router = express.Router();

const cache = new QuickLRU({ maxSize: 1000 });

router.get("/*", async function renderCaching(req, res, next) {
  if (
    req.path.startsWith("/_next/image") ||
    req.path.startsWith("/_next/static") ||
    req.path.startsWith("/search")
  ) {
    return next();
  }

  const key = req.url;
  if (cache.has(key)) {
    res.setHeader("x-middleware-cache", "hit");
    const [body, headers] = cache.get(key);
    Object.entries(headers).forEach(([key, value]) => {
      if (key !== "x-middleware-cache") res.setHeader(key, value);
    });
    return res.status(200).send(body);
  } else {
    res.setHeader("x-middleware-cache", "miss");
  }

  const originalEndFunc = res.end.bind(res);
  res.end = function (body) {
    if (body && res.statusCode === 200) {
      cache.set(key, [body, res.getHeaders()]);
      // console.log(
      //   `HEAP AFTER CACHING ${(
      //     process.memoryUsage().heapUsed /
      //     1024 /
      //     1024
      //   ).toFixed(1)}MB`
      // );
    }
    return originalEndFunc(body);
  };

  next();
});

export default router;

It's far from perfect and I only just coded this yesterday afternoon. My server runs a single Node process so the max heap memory would theoretically be 1,000 x the average size of those response bodies. If you're worried about bloating your memory, just adjust the QuickLRU to something smaller.

Let's talk about your keys

In my basic version, I chose this cache key:


const key = req.url;

but that means that http://localhost:3000/foo?a=1 is different from http://localhost:3000/foo?b=2 which might be a mistake if you're certain that no rendering ever depends on a query string.

But this is totally up to you! For example, suppose that you know your site depends on the darkmode cookie, you can do something like this:


const key = `${req.path} ${req.cookies['darkmode']==='dark'} ${rec.headers['accept-language']}`

Or,


const key = req.path.startsWith('/search') ? req.url : req.path

Purging

As soon as I launched this code, I watched the log files, and voila!:

::ffff:127.0.0.1 [18/Feb/2022:12:59:36 +0000] GET /about HTTP/1.1 200 - - 422.356 ms
::ffff:127.0.0.1 [18/Feb/2022:12:59:43 +0000] GET /about HTTP/1.1 200 - - 1.133 ms

Cool. It works. But the problem with a simple LRU cache is that it's sticky. And it's stored inside a running process's memory. How is the Express server middleware supposed to know that the content has changed and needs a cache purge? It doesn't. It can't know. The only one that knows is my Django server which accepts the various write operations that I know are reasons to purge the cache. For example, if I approve a blog post comment or an edit to the page, it triggers the following (simplified) Python code:


import requests

def cache_purge(url):
    if settings.PURGE_URL:
        print(requests.get(settings.PURGE_URL, json={
           pathnames: [url]
        }, headers={
           "Authorization": f"Bearer {settings.PURGE_SECRET}"
        })

    if settings.KEYCDN_API_KEY:
        api = keycdn.Api(settings.KEYCDN_API_KEY)
        print(api.delete(
            f"zones/purgeurl/{settings.KEYCDN_ZONE_ID}.json", 
            {"urls": [url]}
        ))    

Now, let's go back to the simplified middleware/render-caching.mjs and look at how we can purge from the LRU over HTTP POST:


const cache = new QuickLRU({ maxSize: 1000 })

router.get("/*", async function renderCaching(req, res, next) {
// ... Same as above
});


router.post("/__purge__", async function purgeCache(req, res, next) {
  const { body } = req;
  const { pathnames } = body;
  try {
    validatePathnames(pathnames)
  } catch (err) {
    return res.status(400).send(err.toString());
  }

  const bearer = req.headers.authorization;
  const token = bearer.replace("Bearer", "").trim();
  if (token !== PURGE_SECRET) {
    return res.status(403).send("Forbidden");
  }

  const purged = [];

  for (const pathname of pathnames) {
    for (const key of cache.keys()) {
      if (
        key === pathname ||
        (key.startsWith("/_next/data/") && key.includes(`${pathname}.json`))
      ) {
        cache.delete(key);
        purged.push(key);
      }
    }
  }
  res.json({ purged });
});

What's cool about that is that it can purge both the regular HTML URL and it can also purge those _next/data/ URLs. Because when NextJS can hijack the <a> click, it can just request the data in JSON form and use existing React components to re-render the page with the different data. So, in a sense, GET /_next/data/RzG7kh1I6ZEmOAPWpdA7g/en/plog/nextjs-faster-with-express-caching.json?oid=nextjs-faster-with-express-caching is the same as GET /plog/nextjs-faster-with-express-caching because of how NextJS works. But in terms of content, they're the same. But worth pointing out that the same piece of content can be represented in different URLs.

Another thing to point out is that this caching is specifically about individual pages. In my blog, for example, the homepage is a mix of the 10 latest entries. But I know this within my Django server so when a particular blog post has been updated, for some reason, I actually send out a bunch of different URLs to the purge where I know its content will be included. It's not perfect but it works pretty well.

Conclusion

The hardest part about caching is cache invalidation. It's usually the inner core of a crux. Sometimes, you're so desperate to survive a stampeding herd problem that you don't care about cache invalidation but as a compromise, you just set the caching time-to-live short.

But I think the most important tenant of good caching is: have full control over it. I.e. don't take it lightly. Build something where you can fully understand and change how it works exactly to your specific business needs.

This idea of letting Express cache responses in memory isn't new but I didn't find any decent third-party solution on NPMJS that I liked or felt fully comfortable with. And I needed to tailor exactly to my specific setup.

Go forth and try it out on your own site! Not all sites or apps need this at all, but if you do, I hope I have inspired a foundation of a solution.