Peterbe.com

A blog and website by Peter Bengtsson

Filtered home page!
Currently only showing blog entries under the category: Node. Clear filter

Benchmark compare Highlight.js vs. Prism

19 May 2020 0 comments   JavaScript, Node

https://github.com/peterbe/syntax-highlight-node-benchmark


tl;dr; I wanted to see which is fastest, in Node, Highlight.js or Prism. The result is; they're both plenty fast but Prism is 9% faster.

The context is all the thousands of little snippets of CSS, HTML, and JavaScript code on MDN.
I first wrote a script that stored almost 9,000 snippets of code. 60% is Javascript and 22% is CSS and rest is HTML.
The mean snippet size was 400 bytes and the median 300 bytes. All ASCII.

Then I wrote three functions:

  1. f1 - opens the snippet, extracts the payload, and saves it in a different place. This measures the baseline for how long the disk I/O read and the disk I/O write takes.
  2. f2 - same as f1 but uses const html = Prism.highlight(payload, Prism.languages[name], name); before saving.
  3. f3 - same as f1 but uses const html = hljs.highlight(name, payload).value; before saving.

The experiment

You can see the hacky benchmark code here: https://github.com/peterbe/syntax-highlight-node-benchmark/blob/master/index.js

Results

The results are (after running each 12 times each):

f1 0.947s   fastest
f2 1.361s   43.6% slower
f3 1.494s   57.7% slower

Memory

In terms of memory usage, Prism maxes heap memory at 60MB (the f1 baseline was 18MB), and Highlight.js maxes heap memory at 60MB too.

Disk space in HTML

Each library produces different HTML. Examples:

Prism

<span class="token selector">.item::after</span> <span class="token punctuation">{</span>
    <span class="token property">content</span><span class="token punctuation">:</span> <span class="token string">"This is my content."</span><span class="token punctuation">;</span>
<span class="token punctuation">}</span>

Highlight.js

<span class="hljs-selector-class">.item</span><span class="hljs-selector-pseudo">::after</span> {
    <span class="hljs-attribute">content</span>: <span class="hljs-string">"This is my content."</span>;
}

Yes, not only does it mean they look different, they use up a different amount of disk space when saved. That matters for web performance and also has an impact on build resources.

Conclusion

Prism is plenty fast for Node. If you're already using Prism, don't worry about having to switch to Highlight.js for added performance.

RAM memory consumption is about the same.

Final HTML from Prism is 30% larger than Highlight.js but when the rendered HTML is included in a full HTML page, the HTML compresses very well because of all the repetition so this is not a good comparison. Or rather, not a lot to worry about.

Well, speed is just one dimension. The features differ too. MDN already uses Prism but does so in the browser. The ultimate context for this blog post is; the speed if we were to do all the syntax highlighting in the server as a build step.

Throw JavaScript errors with extra information

12 May 2020 0 comments   JavaScript, Node


Did you know, if you can create your own new Error instance and attach your own custom properties on that? This can come in very handy when you, from the caller, want to get more structured information from the error without relying on the error message.

// WRONG ⛔️

try {
  for (const i of [...Array(10000).keys()]) {
    if (Math.random() > 0.999) {
      throw new Error(`Failed at ${i}`);
    }
  }
} catch (err) {
  const iteration = parseInt(err.toString().match(/Failed at (\d+)/)[1]);
  console.warn(`Made it to ${iteration}`);
}

// RIGHT ✅

try {
  for (const i of [...Array(10000).keys()]) {
    if (Math.random() > 0.999) {
      const failure = new Error(`Failed at ${i}`);
      failure.iteration = i;
      throw failure;
    }
  }
} catch (err) {
  const iteration = err.iteration;
  console.warn(`Made it to ${iteration}`);
}

The above examples are obviously a bit contrived but you have to imagine that whatever code can throw an error might be "far away" from where you deal with errors thrown. For example, imagine you start off a build and you want to get extra information about what the context was. In Python, you use exception classes as a form of natural filtering but JavaScript doesn't have that. Using custom error properties can be a great tool to separate unexpected errors from expected errors.

Bonus - Checking for the custom property

Imagine this refactoring:

try {
  for (const i of [...Array(10000).keys()]) {
    if (Math.random() > 0.999) {
      const failure = new Error(`Failed at ${i}`);
      failure.iteration = i;
      throw failure;
    }
    if (Math.random() < 0.001) {
      throw new Error("something else is wrong");
    }
  }
} catch (err) {
  const iteration = err.iteration;
  console.warn(`Made it to ${iteration}`);
}

With that code it's very possible you'd get Made it to undefined. So here's how you'd make the distinction:

try {
  for (const i of [...Array(10000).keys()]) {
    if (Math.random() > 0.999) {
      const failure = new Error(`Failed at ${i}`);
      failure.iteration = i;
      throw failure;
    }
    if (Math.random() < 0.001) {
      throw new Error("something else is wrong");
    }
  }
} catch (err) {
  if (err.hasOwnProperty("iteration")) {
    const iteration = err.iteration;
    console.warn(`Made it to ${iteration}`);
  } else {
    throw err;
  }
}

```

How to use minimalcss without a server

24 April 2020 0 comments   Web development, JavaScript, Node

https://github.com/peterbe/how-to-use-minimalcss-without-a-server


minimalcss requires that you have your HTML in a serving HTTP web page so that puppeteer can open it to find out the CSS within. Suppose, in your build system, you don't yet really have a server. Well, what you can do is start one on-the-fly and shut it down as soon as you're done.

Suppose you have .html file

First install all the stuff:

yarn add minimalcss http-server

Then run it:

const path = require("path");

const minimalcss = require("minimalcss");
const httpServer = require("http-server");

const HTML_FILE = "index.html";  // THIS IS YOURS

(async () => {
  const server = httpServer.createServer({
    root: path.dirname(path.resolve(HTML_FILE)),
  });
  server.listen(8080);

  let result;
  try {
    result = await minimalcss.minimize({
      urls: ["http://0.0.0.0:8080/" + HTML_FILE],
    });
  } catch (err) {
    console.error(err);
    throw err;
  } finally {
    server.close();
  }

  console.log(result.finalCss);
})();

And the index.html file:

<!DOCTYPE html>
<html>
    <head>
        <link rel="stylesheet" href="styles.css">
    </head>
    <body>
        <p>Hi @peterbe</p>
    </body>
</html>

And the styles.css file:

h1 {
  color: red;
}
p,
li {
  font-weight: bold;
}

And the output from running that Node script:

p{font-weight:700}

It works!

Suppose all you have is the HTML string and the CSS blob(s)

Suppose all you have is a string of HTML and a list of strings of CSS:

const fs = require("fs");
const path = require("path");

const minimalcss = require("minimalcss");
const httpServer = require("http-server");

const HTML_BODY = `
<p>Hi Peterbe</p>
`;

const CSSes = [
  `
h1 {
  color: red;
}
p,
li {
  font-weight: bold;
}
`,
];

(async () => {
  const csses = CSSes.map((css, i) => {
    fs.writeFileSync(`${i}.css`, css);
    return `<link rel="stylesheet" href="${i}.css">`;
  });
  const html = `<!doctype html><html>
  <head>${csses}</head>
  <body>${HTML_BODY}</body>
  </html>`;
  const fp = path.resolve("./index.html");
  fs.writeFileSync(fp, html);
  const server = httpServer.createServer({
    root: path.dirname(fp),
  });
  server.listen(8080);

  let result;
  try {
    result = await minimalcss.minimize({
      urls: ["http://0.0.0.0:8080/" + path.basename(fp)],
    });
  } catch (err) {
    console.error(err);
    throw err;
  } finally {
    server.close();
    fs.unlinkSync(fp);
    CSSes.forEach((_, i) => fs.unlinkSync(`${i}.css`));
  }

  console.log(result.finalCss);
})();

Truth be told, you'll need a good pinch of salt to appreciate that example code. It works but most likely, if you're into web performance so much that you're even doing this, your parameters are likely to be more complex.

Suppose you have your own puppeteer instance

In the first example above, minimalcss will create an instance of puppeteer (e.g. const browser = await puppeteer.launch()) but that means you have less control over which version of puppeteer or which parameters you need. Also, if you have to run minimalcss on a bunch of pages it's costly to have to create and destroy puppeteer browser instances repeatedly.

To modify the original example, here's how you use your own instance of puppeteer:

  const path = require("path");

+ const puppeteer = require("puppeteer");
  const minimalcss = require("minimalcss");
  const httpServer = require("http-server");

  const HTML_FILE = "index.html"; // THIS IS YOURS

  (async () => {
    const server = httpServer.createServer({
      root: path.dirname(path.resolve(HTML_FILE)),
    });
    server.listen(8080);

+   const browser = await puppeteer.launch(/* your special options */);
+
    let result;
    try {
      result = await minimalcss.minimize({
        urls: ["http://0.0.0.0:8080/" + HTML_FILE],
+       browser,
      });
    } catch (err) {
      console.error(err);
      throw err;
    } finally {
+     await browser.close();
      server.close();
    }

    console.log(result.finalCss);
  })();

Note that this doesn't buy us anything in this particular example. But that's where your imagination comes in!

Conclusion

You can see the code here as a git repo if that helps.

The point is that this might solve some of the chicken-and-egg problem you might have is that you're building your perfect HTML + CSS and you want to perfect it before you ship it.

Note also that there are other ways to run minimalcss other than programmatically. For example, minimalcss-server is minimalcss wrapped in a express server.

Another thing that you might have is that you have multiple .html files that you want to process. The same technique applies but you just need to turn it into a loop and make sure you call server.close() (and optionally await browser.close()) when you know you've processed the last file. Exercise left to the reader?

How post JSON with curl to an Express app

15 April 2020 0 comments   JavaScript, Node


tl;dr; No need install or require body-parser and it's important to send the right content-type header.

I know Express has great documentation but I'm still confused about how to receive JSON and/or how to test it from curl. A great deal of confusion comes from the fact that, I think, body-parser used to be a third-party library you had to install yourself to add it to your Express app. You don't. It now gets installed by installing express. E.g.

▶ yarn init -y
▶ yarn add express
▶ ls node_modules/body-parser
HISTORY.md   LICENSE      README.md    index.js     lib          package.json

Let's work backward. This is how you set up the Express handler:

const express = require("express");  // v4.17.x as of Apr 2020
const app = express();

app.use(express.json());

app.post("/echo", (req, res) => {
  res.json(req.body);
}); 

app.listen(5000);

And, this is how you test it:

▶ curl -XPOST -d '{"foo": "bar"}' -H 'content-type: application/json' localhost:5000/echo
{"foo":"bar"}%

That's it. No need to require("body-parser") or anything like that. And make sure you're sending the content-type: application/json in the curl command.

Things that can go wrong

I kept fumbling around on StackOverflow questions and rummaging the Express documentation until I figured out what mistake I kept doing. So, here's a variant of the handler above, but much more verbose:

app.post("/echo", (req, res) => {

  if (req.body === undefined) {
    throw new Error("express.json middleware not installed");
  }
  if (!Object.keys(req.body).length) {
    // E.g curl -v -XPOST http://localhost:5000/echo
    if (!req.get("content-Type")) {
      return res.status(400).send("no content-type header\n");
    }
    // E.g. curl -v -XPOST -d '{"foo": "bar"}' http://localhost:5000/echo
    if (!req.get("content-Type").includes("application/json")) {
      return res.status(400).send("content-type not application/json\n");
    }
    // E.g. curl -XPOST -H 'content-type:application/json' http://localhost:5000/echo
    return res.status(400).send("no data payload included\n");
  }

  // At this point 'req.body' is *something*.
  // For example, you might want to `console.log(req.body.foo)`
  res.json(req.body);
}); 

How you treat these things is up to you. For example, an empty JSON data might be OK in your application.
I.e. perhaps curl -XPOST -d '{}' -H 'content-type:application/json' http://localhost:5000/echo might be fine.

An important option

express.json() is a piece of middleware. By default, it has a simple mechanism for bothering to do put .body into the request object. The default configuration is as if you'd typed:

app.use(express.json({
  type: 'application/json',
}));

(it's actually a bit more complicated than that)

If you're confident that you'll always be sending JSON to this handler, and you don't want to have to force clients to have to specify the application/json Content-Type you can change this to:

app.use(express.json({
  type: '*/*',
}));

Now you'll find that curl -XPOST -d '{"foo": "bar3"}' localhost:5000/ will work fine.

Instead of curl, let's fetch

This code works the same with node-fetch or browser Fetch API.

fetch("http://localhost:5000/echo", {
  method: "post",
  body: JSON.stringify({ foo: "bar" }),
  headers: { "Content-Type": "application/json" },
})
  .then((res) => res.json())
  .then((json) => console.log(json));

How to install Node 12 on Ubuntu (Eoan Ermine) 19.10

08 April 2020 0 comments   Linux, Node

https://github.com/nodesource/distributions/blob/master/README.md#debinstall


I'm setting up a new Ubuntu (Eoan Ermine) 19.10 server and I noticed that apt install nodejs gives you Node v10 which is an LTS (Long Term Support) version that'll last till April 2021. However, I want Node v12 which is the most recent LTS release as of April 2020.

To install it I used these instructions:

curl -sL https://deb.nodesource.com/setup_12.x | sudo -E bash -
sudo apt-get install -y nodejs

That worked great.
When it finished, it spat out this nice little blurb about how to install yarn:

...
Fetched 7454 B in 1s (12.3 kB/s)
Reading package lists... Done

## Run `sudo apt-get install -y nodejs` to install Node.js 12.x and npm
## You may also need development tools to build native addons:
     sudo apt-get install gcc g++ make
## To install the Yarn package manager, run:
     curl -sL https://dl.yarnpkg.com/debian/pubkey.gpg | sudo apt-key add -
     echo "deb https://dl.yarnpkg.com/debian/ stable main" | sudo tee /etc/apt/sources.list.d/yarn.list
     sudo apt-get update && sudo apt-get install yarn

By the way, I have no idea what nodejs-mozilla but running apt show nodejs-mozilla yields:

Package: nodejs-mozilla
Version: 12.16.1-0ubuntu0.19.10.1
Priority: optional
Section: universe/javascript
Origin: Ubuntu
Maintainer: Ubuntu Developers <ubuntu-devel-discuss@lists.ubuntu.com>
Bugs: https://bugs.launchpad.net/ubuntu/+filebug
Installed-Size: 42.0 MB
Depends: libc6 (>= 2.29), libgcc1 (>= 1:3.4), libstdc++6 (>= 9)
Homepage: http://nodejs.org/
Download-Size: 10.4 MB
APT-Sources: http://mirrors.digitalocean.com/ubuntu eoan-updates/universe amd64 Packages
Description: evented I/O for V8 javascript
 Node.js is a platform built on Chrome's JavaScript runtime for easily
 building fast, scalable network applications. Node.js uses an
 event-driven, non-blocking I/O model that makes it lightweight and
 efficient, perfect for data-intensive real-time applications that run
 across distributed devices.
 .
 Node.js is bundled with several useful libraries to handle server
 tasks:
 .
 System, Events, Standard I/O, Modules, Timers, Child Processes, POSIX,
 HTTP, Multipart Parsing, TCP, DNS, Assert, Path, URL, Query Strings.

Installing it doesn't add a node executable and I can't find a home page for it. apt can be weird sometimes.

Performance of truth checking a JavaScript object

03 February 2020 0 comments   JavaScript, Node


I'm working on a Node project that involves large transformations of large sets of data here and there. For example:

if (!Object.keys(this.allTitles).length) {
  ...

In my case, that this.allTitles is a plain object with about 30,000 key/value pairs. That particular line of code actually only runs 1 single time so if it's hundreds of milliseconds, it's really doesn't matter that much. However, that's not a guarantee! What if you had something like this:

for (const thing of things) {
  if (!Object.keys(someObj).length) {
    // mutate someObj
  }
}

then, you'd potentially have a performance degradation once someObj becomes considerably large. And it gets particularly degraded if the length of things is considerably large as it would do the operation many times.

Actually, consider this:

const obj = {};
[...Array(30000)].forEach((_, i) => {
  obj[i] = i;
});

console.time("Truthcheck obj");
[...Array(100)].forEach((_, i) => {
  return !!Object.keys(obj).length;
});
console.timeEnd("Truthcheck obj");

On my macBook with Node 13.5, this outputs:

Truthcheck obj: 260.564ms

Maps

The MDN page on Map has a nice comparison, in terms of performance, between Map and regular object. Consider this super simple benchmark:

const obj = {};
const map = new Map();

[...Array(30000)].forEach((_, i) => {
  obj[i] = i;
  map.set(i, i);
});

console.time("Truthcheck obj");
[...Array(100)].forEach((_, i) => {
  return !!Object.keys(obj).length;
});
console.timeEnd("Truthcheck obj");

console.time("Truthcheck map");
[...Array(100)].forEach((_, i) => {
  return !!map.size;
});
console.timeEnd("Truthcheck map");

So, fill a Map instance and a plain object with 30,000 keys and values. Then, for each in turn, check if the thing is truthy 100 times. The output I get:

Truthcheck obj: 235.017ms
Truthcheck map: 0.029ms

That's not unexpected. The map instance maintains a size counter, which increments on .set (if the key is new), so doing that "truthy" check just takes O(1) seconds.

Conclusion

Don't run to rewrite everything to Maps!

In fact, I took the above mentioned little benchmark and changed the times to be a 3,000 item map and obj (instead of 30,000) and only did 10 iterations (instead of 100) and then the numbers are:

Truthcheck obj: 0.991ms
Truthcheck map: 0.044ms

These kinds of small numbers are very unlikely to matter in the scope of other things going on.

Anyway, consider using Map if you fear that you might be working with really reeeeally large mappings.