Sorry about the weird title of this blog post. Not sure what else to call it.

I have a function that recursively traverses the file system. You can iterate over this function to do something with each found file on disk. Silly example:


for (const filePath of walker("/lots/of/files/here")) {
  count += filePath.length;
}

The implementation looks like this:


function* walker(root) {
  const files = fs.readdirSync(root);
  for (const name of files) {
    const filepath = path.join(root, name);
    const isDirectory = fs.statSync(filepath).isDirectory();
    if (isDirectory) {
      yield* walker(filepath);
    } else {
      yield filepath;
    }
  }
}

But I wondered; is it faster to not use a generator function since there might an overhead in swapping from the generator to whatever callback does something with each yielded thing. A pure big-array function looks like this:


function walker(root) {
  const files = fs.readdirSync(root);
  const all = [];
  for (const name of files) {
    const filepath = path.join(root, name);
    const isDirectory = fs.statSync(filepath).isDirectory();
    if (isDirectory) {
      all.push(...walker(filepath));
    } else {
      all.push(filepath);
    }
  }
  return all;
}

It gets the same result/outcome.

It's hard to measure this but I pointed it to some large directory with many files and did something silly with each one just to make sure it does something:


const label = "generator";
console.time(label);
let count = 0;
for (const filePath of walker(SEARCH_ROOT)) {
  count += filePath.length;
}
console.timeEnd(label);
const heapBytes = process.memoryUsage().heapUsed;
console.log(`HEAP: ${(heapBytes / 1024.0).toFixed(1)}KB`);

I ran it a bunch of times. After a while, the numbers settle and you get:

  • Generator function: (median time) 1.74s
  • Big array function: (median time) 1.73s

In other words, no speed difference.

Obviously building up a massive array in memory will increase the heap memory usage. Taking a snapshot at the end of the run and printing it each time, you can see that...

  • Generator function: (median heap memory) 4.9MB
  • Big array function: (median heap memory) 13.9MB

Conclusion

The potential swap overhead for a Node generator function is absolutely minuscule. At least in contexts similar to mine.

It's not unexpected that the generator function bounds less heap memory because it doesn't build up a big array at all.

Comments

Your email will never ever be published.

Related posts