Peterbe.com

A blog and website by Peter Bengtsson

Filtered home page!
Currently only showing blog entries under thecategory: Node. Clear filter

The best and simplest way to parse an RSS feed in Node

Node, JavaScript

There are a lot of 'rss' related NPM packages but I think I've found a combination that is great for parsing RSS feeds. Something that takes up the minimal node_modules and works great. I think the killer combination is

The code impressively simple:

const got = require("got");
const parser = require("fast-xml-parser");

(async function main() {
  const buffer = await got("https://hacks.mozilla.org/feed/", {
    responseType: "buffer",
    resolveBodyOnly: true,
    timeout: 5000,
    retry: 5,
  });
  var feed = parser.parse(buffer.toString());
  for (const item of feed.rss.channel.item) {
    console.log({ title: item.title, url: item.link });
    break;
  }
})();


// Outputs...
// {
//   title: 'MDN localization update, February 2021',
//   url: 'https://hacks.mozilla.org/2021/02/mdn-localization-update-february-2021/'
// }

I like about fast-xml-parser is that it has no dependencies. And it's tiny:

▶ du -sh node_modules/fast-xml-parser
104K    node_modules/fast-xml-parser

The got package is quite a bit larger and has more dependencies. But I still love it. It's proven itself to be very reliable and very pleasant API. Both packages support TypeScript too.

A particular detail I like about fast-xml-parser is that it doesn't try to do the downloading part too. This way, I can use my own preferred library and I could potentially write my own caching code if I want to protect against flaky network.

Please post a comment if you have thoughts or questions.

sharp vs. jimp - Node libraries to make thumbnail images

Node, JavaScript, Firebase

I recently wrote a Google Firebase Cloud function that resizes images on-the-fly and after having published that I discovered that sharp is "better" than jimp. And by better I mean better performance.

To reach this conclusion I wrote a simple trick that loops over a bunch of .png and .jpg files I had lying around and compare how long it took each implementation to do that. Here are the results:

Using jimp

▶ node index.js ~/Downloads
Sum size before: 41.1 MB (27 files)
...
Took: 28.278s
Sum size after: 337 KB

Using sharp

▶ node index.js ~/Downloads
Sum size before: 41.1 MB (27 files)
...
Took: 1.277s
Sum size after: 200 KB

The files are in the region of 100-500KB, a couple that are 1-3MB, and 1 that is 18MB.

So basically: 28 seconds for jimp and 1.3 seconds for sharp

Bonus, the code

Don't ridicule me for my benchmarking code. These are quick hacks. Let's focus on the point.

sharp

function f1(sourcePath, destination) {
  return readFile(sourcePath).then((buffer) => {
    console.log(sourcePath, "is", humanFileSize(buffer.length));
    return sharp(sourcePath)
      .rotate()
      .resize(100)
      .toBuffer()
      .then((data) => {
        const destPath = path.join(destination, path.basename(sourcePath));
        return writeFile(destPath, data).then(() => {
          return stat(destPath).then((s) => s.size);
        });
      });
  });
}

jimp

function f2(sourcePath, destination) {
  return readFile(sourcePath).then((buffer) => {
    console.log(sourcePath, "is", humanFileSize(buffer.length));
    return Jimp.read(sourcePath).then((img) => {
      const destPath = path.join(destination, path.basename(sourcePath));
      img.resize(100, Jimp.AUTO);
      return img.writeAsync(destPath).then(() => {
        return stat(destPath).then((s) => s.size);
      });
    });
  });
}

I test them like this:

console.time("Took");
const res = await Promise.all(files.map((file) => f1(file, destination)));
console.timeEnd("Took");

And just to be absolutely sure, I run them separately so the whole process is dedicated to one implementation.

Please post a comment if you have thoughts or questions.

downloadAndResize - Firebase Cloud Function to serve thumbnails

Web development, That's Groce!, Node, JavaScript

UPDATE 2020-12-30

With sharp after you've loaded the image (sharp(contents)) make sure to add .rotate() so it automatically rotates the image correctly based on EXIF data.

UPDATE 2020-12-13

I discovered that sharp is much better than jimp. It's order of maginitude faster. And it's actually what the Firebase Resize Images extension uses. Code updated below.

I have a Firebase app that uses the Firebase Cloud Storage to upload images. But now I need thumbnails. So I wrote a cloud function that can generate thumbnails on-the-fly.

There's a Firebase Extension called Resize Images which is nicely done but I just don't like that strategy. At least not for my app. Firstly, I'm forced to pick the right size(s) for thumbnails and I can't really go back on that. If I pick 50x50, 1000x1000 as my sizes, and depend on that in the app, and then realize that I actually want it to be 150x150, 500x500 then I'm quite stuck.

Instead, I want to pick any thumbnail sizes dynamically. One option would be a third-party service like imgix, CloudImage, or Cloudinary but these are not free and besides, I'll need to figure out how to upload the images there. There are other Open Source options like picfit which you install yourself but that's not an attractive option with its implicit complexity for a side-project. I want to stay in the Google Cloud. Another option would be this AppEngine function by Albert Chen which looks nice but then I need to figure out the access control between that and my Firebase Cloud Storage. Also, added complexity.

As part of your app initialization in Firebase, it automatically has access to the appropriate storage bucket. If I do:

const storageRef = storage.ref();
uploadTask = storageRef.child('images/photo.jpg').put(file, metadata);
...

...in the Firebase app, it means I can do:

 admin
      .storage()
      .bucket()
      .file('images/photo.jpg')
      .download()
      .then((downloadData) => {
        const contents = downloadData[0];

...in my cloud function and it just works!

And to do the resizing I use Jimp which is TypeScript aware and easy to use. Now, remember this isn't perfect or mature but it works. It solves my needs and perhaps it will solve your needs too. Or, at least it might be a good start for your application that you can build on. Here's the function (in functions/src/index.ts):

interface StorageErrorType extends Error {
  code: number;
}

const codeToErrorMap: Map<number, string> = new Map();
codeToErrorMap.set(404, "not found");
codeToErrorMap.set(403, "forbidden");
codeToErrorMap.set(401, "unauthenticated");

export const downloadAndResize = functions
  .runWith({ memory: "1GB" })
  .https.onRequest(async (req, res) => {
    const imagePath = req.query.image || "";
    if (!imagePath) {
      res.status(400).send("missing 'image'");
      return;
    }
    if (typeof imagePath !== "string") {
      res.status(400).send("can only be one 'image'");
      return;
    }
    const widthString = req.query.width || "";
    if (!widthString || typeof widthString !== "string") {
      res.status(400).send("missing 'width' or not a single string");
      return;
    }
    const extension = imagePath.toLowerCase().split(".").slice(-1)[0];
    if (!["jpg", "png", "jpeg"].includes(extension)) {
      res.status(400).send(`invalid extension (${extension})`);
      return;
    }
    let width = 0;
    try {
      width = parseInt(widthString);
      if (width < 0) {
        throw new Error("too small");
      }
      if (width > 1000) {
        throw new Error("too big");
      }
    } catch (error) {
      res.status(400).send(`width invalid (${error.toString()}`);
      return;
    }

    admin
      .storage()
      .bucket()
      .file(imagePath)
      .download()
      .then((downloadData) => {
        const contents = downloadData[0];
        console.log(
          `downloadAndResize (${JSON.stringify({
            width,
            imagePath,
          })}) downloadData.length=${humanFileSize(contents.length)}\n`
        );

        const contentType = extension === "png" ? "image/png" : "image/jpeg";
        sharp(contents)
          .rotate()
          .resize(width)
          .toBuffer()
          .then((buffer) => {
            res.setHeader("content-type", contentType);
            // TODO increase some day
            res.setHeader("cache-control", `public,max-age=${60 * 60 * 24}`);
            res.send(buffer);
          })
          .catch((error: Error) => {
            console.error(`Error reading in with sharp: ${error.toString()}`);
            res
              .status(500)
              .send(`Unable to read in image: ${error.toString()}`);
          });
      })
      .catch((error: StorageErrorType) => {
        if (error.code && codeToErrorMap.has(error.code)) {
          res.status(error.code).send(codeToErrorMap.get(error.code));
        } else {
          res.status(500).send(error.message);
        }
      });
  });

function humanFileSize(size: number): string {
  if (size < 1024) return `${size} B`;
  const i = Math.floor(Math.log(size) / Math.log(1024));
  const num = size / Math.pow(1024, i);
  const round = Math.round(num);
  const numStr: string | number =
    round < 10 ? num.toFixed(2) : round < 100 ? num.toFixed(1) : round;
  return `${numStr} ${"KMGTPEZY"[i - 1]}B`;
}

Here's what a sample URL looks like.

I hope it helps!

I think the next thing for me to consider is to extend this so it uploads the thumbnail back and uses the getDownloadURL() of the created thumbnail as a redirect instead. It would be transparent to the app but saves on repeated views. That'd be a good optimization.

Please post a comment if you have thoughts or questions.

Quick comparison between sass and node-sass

Node, JavaScript

To transpile .scss (or .sass) in Node you have the choice between sass and node-sass. sass is a JavaScript compilation of Dart Sass which is supposedly "the primary implementation of Sass" which is a pretty powerful statement. node-sass on the other hand is a wrapper on LibSass which is written in C++. Let's break it down a little bit more.

Speed

node-sass is faster. About 7 times faster. I took all the SCSS files behind the current MDN Web Docs which is fairly large. Transformed into CSS it becomes a ~180KB blob of CSS (92KB when optimized with csso).

Here's my ugly benchmark test which I run about 10 times like this:

node-sass took 101ms result 180kb 92kb
node-sass took 99ms result 180kb 92kb
node-sass took 99ms result 180kb 92kb
node-sass took 100ms result 180kb 92kb
node-sass took 100ms result 180kb 92kb
node-sass took 103ms result 180kb 92kb
node-sass took 102ms result 180kb 92kb
node-sass took 113ms result 180kb 92kb
node-sass took 100ms result 180kb 92kb
node-sass took 101ms result 180kb 92kb

And here's the same thing for sass:

sass took 751ms result 173kb 92kb
sass took 728ms result 173kb 92kb
sass took 728ms result 173kb 92kb
sass took 798ms result 173kb 92kb
sass took 854ms result 173kb 92kb
sass took 726ms result 173kb 92kb
sass took 727ms result 173kb 92kb
sass took 782ms result 173kb 92kb
sass took 834ms result 173kb 92kb

In another example, I ran sass and node-sass on ./node_modules/bootstrap/scss/bootstrap.scss (version 5.0.0-alpha1) and the results are after 5 runs:

node-sass took 269ms result 176kb 139kb
node-sass took 260ms result 176kb 139kb
node-sass took 288ms result 176kb 139kb
node-sass took 261ms result 176kb 139kb
node-sass took 260ms result 176kb 139kb

versus

sass took 1423ms result 176kb 139kb
sass took 1350ms result 176kb 139kb
sass took 1338ms result 176kb 139kb
sass took 1368ms result 176kb 139kb
sass took 1467ms result 176kb 139kb

Output

The unminified CSS difference primarily in the indentation. But you minify both outputs and the pretty print them (with prettier) you get the following difference:

▶ diff /tmp/sass.min.css.pretty /tmp/node-sass.min.css.pretty
152c152
<   letter-spacing: -0.0027777778rem;
---
>   letter-spacing: -0.00278rem;
231c231
<   content: "▼︎";
---
>   content: "\25BC\FE0E";

...snip...


2804c2812
< .external-icon:not([href^="https://mdn.mozillademos.org"]):not(.ignore-external) {
---
> .external-icon:not([href^='https://mdn.mozillademos.org']):not(.ignore-external) {

Basically, sass will use produce things like letter-spacing: -0.0027777778rem; and content: "▼︎";. And node-sass will produce letter-spacing: -0.00278rem; and content: "\25BC\FE0E";.
I also noticed some minor difference just in the order of some selectors but when I look more carefully, they're immaterial order differences meaning they're not cascading each other in any way.

Note! I don't know why the use of ' and " is different or if it matters. I don't know know why prettier (version 2.1.1) didn't pick one over the other consistently.

node_modules

Here's how I created two projects to compare

cd /tmp
mkdir just-sass && cd just-sass && yarn init -y && time yarn add sass && cd ..
mkdir just-node-sass && cd just-node-sass && yarn init -y && time yarn add node-sass && cd ..

Considering that sass is just a JavaScript compilation of a Dart program, all you get is basically a 3.6MB node_modules/sass/sass.dart.js file.

The /tmp/just-sass/node_modules directory is only 113 files and folders weighing a total of 4.1MB.
Whereas /tmp/just-node-sass/node_modules directory is 3,658 files and folders weighing a total of 15.2MB.

I don't know about you but I'm very skeptical that node-gyp ever works. Who even has Python 2.7 installed anymore? Being able to avoid node-gyp seems like a win for sass.

Conclusion

The speed difference may or may not matter. If you're only doing it once, who cares about a couple of hundred milliseconds. But if you're forced to have to wait 1.4 seconds on every Ctrl-S when Webpack or whatever tooling you have starts up sass it might become very painful.

I don't know much about the sass-loader Webpack plugin but it apparently works with either but they do recommend sass in their documentation. And it's the default implementation too.

It's definitely a feather in sass's hat that Dart Sass is the "primary implementation" of Sass. That just has a nice feelin in sass's favor.

Bonus

NPMCompare has a nice comparison of them as projects but you have to study each row of numbers because it's rarely as simple as more (or less) number is better. For example, the number of open issues isn't a measure of bugs.

The new module system launched in October 2019 supposedly only comes to Dart Sass which means sass is definitely going to get it first. If that stuff matters to you. For example, true, the Sass unit-testing tool, now requires Dart Sass and drops support for node-sass.

Please post a comment if you have thoughts or questions.

findMatchesInText - Find line and column of matches in a text, in JavaScript

Node, JavaScript

https://gist.github.com/peterbe/a210437cd282ef0963749520f6b09a1c

I need this function to relate to open-editor which is a Node program that can open your $EDITOR from Node and jump to a specific file, to a specific line, to a specific column.

Here's the code:

function* findMatchesInText(needle, haystack, { inQuotes = false } = {}) {
  const escaped = needle.replace(/[.*+?^${}()|[\]\\]/g, "\\$&");
  let rex;
  if (inQuotes) {
    rex = new RegExp(`['"](${escaped})['"]`, "g");
  } else {
    rex = new RegExp(`(${escaped})`, "g");
  }
  for (const match of haystack.matchAll(rex)) {
    const left = haystack.slice(0, match.index);
    const line = (left.match(/\n/g) || []).length + 1;
    const lastIndexOf = left.lastIndexOf("\n") + 1;
    const column = match.index - lastIndexOf + 1;
    yield { line, column };
  }
}

And you use it like this:

const text = ` bravo
Abra
cadabra

bravo
`;

console.log(Array.from(findMatchesInText("bra", text)));

Which prints:

[
  { line: 1, column: 2 },
  { line: 2, column: 2 },
  { line: 3, column: 5 },
  { line: 5, column: 1 }
]

The inQuotes option is because a lot of times this function is going to be used for finding the href value in unstructured documents that contain HTML <a> tags.

Please post a comment if you have thoughts or questions.

Benchmark compare Highlight.js vs. Prism

Node, JavaScript

https://github.com/peterbe/syntax-highlight-node-benchmark

tl;dr; I wanted to see which is fastest, in Node, Highlight.js or Prism. The result is; they're both plenty fast but Prism is 9% faster.

The context is all the thousands of little snippets of CSS, HTML, and JavaScript code on MDN.
I first wrote a script that stored almost 9,000 snippets of code. 60% is Javascript and 22% is CSS and rest is HTML.
The mean snippet size was 400 bytes and the median 300 bytes. All ASCII.

Then I wrote three functions:

  1. f1 - opens the snippet, extracts the payload, and saves it in a different place. This measures the baseline for how long the disk I/O read and the disk I/O write takes.
  2. f2 - same as f1 but uses const html = Prism.highlight(payload, Prism.languages[name], name); before saving.
  3. f3 - same as f1 but uses const html = hljs.highlight(name, payload).value; before saving.

The experiment

You can see the hacky benchmark code here: https://github.com/peterbe/syntax-highlight-node-benchmark/blob/master/index.js

Results

The results are (after running each 12 times each):

f1 0.947s   fastest
f2 1.361s   43.6% slower
f3 1.494s   57.7% slower

Memory

In terms of memory usage, Prism maxes heap memory at 60MB (the f1 baseline was 18MB), and Highlight.js maxes heap memory at 60MB too.

Disk space in HTML

Each library produces different HTML. Examples:

Prism

<span class="token selector">.item::after</span> <span class="token punctuation">{</span>
    <span class="token property">content</span><span class="token punctuation">:</span> <span class="token string">"This is my content."</span><span class="token punctuation">;</span>
<span class="token punctuation">}</span>

Highlight.js

<span class="hljs-selector-class">.item</span><span class="hljs-selector-pseudo">::after</span> {
    <span class="hljs-attribute">content</span>: <span class="hljs-string">"This is my content."</span>;
}

Yes, not only does it mean they look different, they use up a different amount of disk space when saved. That matters for web performance and also has an impact on build resources.

  • f1 - baseline "HTML" files amounts to 11.9MB (across 3,025 files)
  • f2 - Prism: 17.6MB
  • f3 - Highlight.js: 13.6MB

Conclusion

Prism is plenty fast for Node. If you're already using Prism, don't worry about having to switch to Highlight.js for added performance.

RAM memory consumption is about the same.

Final HTML from Prism is 30% larger than Highlight.js but when the rendered HTML is included in a full HTML page, the HTML compresses very well because of all the repetition so this is not a good comparison. Or rather, not a lot to worry about.

Well, speed is just one dimension. The features differ too. MDN already uses Prism but does so in the browser. The ultimate context for this blog post is; the speed if we were to do all the syntax highlighting in the server as a build step.

Please post a comment if you have thoughts or questions.