Cheerio is a fantastic Node library for parsing HTML and then being able to manipulate and serialize it. But you can also just use it for parsing HTML and plucking out what you need. We use that to prepare the text that goes into our search index for our site. It basically works like this:


const body = await getBody('http://localhost:4002' + eachPage.path)
const $ = cheerio.load(body)
const title = $('h1').text()
const intro = $('p.intro').text()
...

But it hit me, can we speed that up? cheerio actually ships with two different parsers:

  1. parse5
  2. htmlparser2

One is faster and one is more strict.
But I wanted to see this in a real-world example.

So I made two runs where I used:


const $ = cheerio.load(body)

in one run, and:


const $ = cheerio.load(body, { xmlMode: true })

in another.

After having parsed 1,635 pages of HTML of various sizes the results are:

FILE: load.txt
MEAN:   13.19457640586797
MEDIAN: 10.5975

FILE: load-xmlmode.txt
MEAN:   3.9020372860635697
MEDIAN: 3.1020000000000003

So, using {xmlMode:true} leads to roughly a 3x speedup.

I think it pretty much confirms the original benchmark, but now I know based on a real application.

Comments

Your email will never ever be published.

Previous:
Programmatically control the matrix in a GitHub Action workflow November 30, 2022 GitHub
Next:
How to change the current query string URL in NextJS v13 with next/navigation December 9, 2022 React, JavaScript
Related by category:
Switching from Next.js to Vite + wouter July 28, 2023 JavaScript, Node
How to SSG a Vite SPA April 26, 2025 JavaScript
An ideal pattern to combine React Router with TanStack Query November 18, 2024 JavaScript
fnm is much faster than nvm. December 28, 2023 Node
Related by keyword:
Fastest way to turn HTML into text in Python January 8, 2021 Python
How to split a block of HTML with Cheerio in NodeJS January 3, 2020 Node, JavaScript
django-html-validator October 20, 2014 Python, Web development, Django
Difference between $.data('foo') and $.attr('data-foo') in jQuery June 10, 2012 JavaScript