Tree-shaking 101

22/10/2024

In a JavaScript module, every top-level expression falls at least into one of these three categories: exports, side-effects or internal logic.

Code that does not fall into one of these categories can be considered dead code.

// Exports
export const greeting = "Hello, World!";
export function sayHello(name) {
  return `${greeting}, ${name}!`;
}

// Side-effects: modifying global state and logging
console.log("Module loaded!");
window.customProperty = "I'm a side-effect!";

// Internal logic: indirectly used within exports or side-effects
const exclamation = "!"; // Used but not exported
function formatMessage(message) {
  // Only used within this module
  return message + exclamation;
}

// Using internal logic in exported function
export function greetWithExclamation(name) {
  return formatMessage(sayHello(name));
}

// Dead code: unused function
function unusedFunction() {
  return "I'm dead code!";
}

Naturally enough, one would want such unused code to be removed when it is built; that's where bundlers come in: part of their job is to remove dead code from the sources they are given, or more precisely, not to include it.

Live code inclusion

Tree-shaking is a dead code elimination (DCE) technique popularized by the Rollup bundler project. While common DCE techniques consists of applying optimizations and removing code from a final program, tree-shaking is about building a final program by only including live code: that's why we are talking about live code inclusion.

Let's take the example of the following program: it consists of three ES modules, a.js, b.js and index.js that is also the entry point.

// a.js
export function foo() {
  console.log("Hello from foo!");
}
window.WORD = "pizza";
// b.js
export function bar() {
  console.log("Hello from bar!");
}
export function baz() {
  console.log("Hello from baz!");
}
// index.js
import { foo } from "./a.js";
import { bar, baz } from "./b.js";

console.log(window.WORD);

foo();
bar();

As you probably noticed, the index.js entry point imports foo from a.js and bar and baz from b.js, but doesn't uses baz. As baz is never used anywhere in the program, it is dead code.

Creating an AST and resolving dependencies

Ok cool, but how does the tree-shaking algorithm come to the conclusion that a piece of code is dead?

By making an abstract syntax tree (AST) out of the input program using a parser (Acorn in the case of Rollup and Webpack). Once the AST is created, the tree-shaker is now able to create a dependency graph in order to identify what each module is exporting, importing, and using: this is called dependencies resolution.

For example, the (really simplified and inexact) dependency resolution of the previous program could be represented as follows:

index.js a.js b.js
Imports
  • foo (a.js)
  • bar, baz (b.js)
- -
Exports -
  • foo
  • bar
  • baz
Usages
  • foo (a.js)
  • bar (b.js)
- -

Explore the actual AST of the program here

Now that it has a dependency graph, identifying live code is pretty straightforward for the three-shaker:

So the tree-shaken code from our example code would be:

// from a.js
window.WORD = "pizza";

// from index.js
console.log(window.WORD);

console.log("Hello from foo!");
console.log("Hello from bar!");

Easy, right?

In this case, yes. But what is a side-effect really? Can all side-effects be identified by the tree-shaker?

Yes... kind of... but not really.

Maintaining side-effects

Fondamentaly, side-effects are the reason why a program exists: accepting inputs from a user, writing to a console or to a disk, making network calls, adding elements to the DOM... without all of it, programs are useless really. For this reason, tree-shakers must be absolutely sure not to accidentaly remove them, which could lead to broken programs. They do so two different ways:

Three-shaking is an optimization that is made statically, which means the tree-shaker can rely only on the AST to detect side-effects: this is sufficient most of the time, but let's not forget that JavaScript is a dynamic language, which means side-effects could hide in code that is not statically analyzable.

Let's take the following program for example:

const sum = "4" + two;

console.log("todo: do something");

As you can see, the value assigned to sum isn't used anywhere, and so should be removed by tree-shaking... but it can't be:

In such cases, tree-shaking algorithms have no other choice than to be conservative and not to remove the code to maintain every potential side-effect, althought it's maybe unused. The tree-shaken code would then be:

// only the `sum` assignation could be removed safely
"4" + two;

console.log("todo: do something");

You can experiment different scenarios online using the Rollup REPL

In a "final" application, such unused exports or useless top-level side-effects are pretty rare, and various well-known tooling exist to statically catch them (ESLint, TypeScript, ts-unused-exports, ...). But in the case of a library, every export or top-level side-effect is potential dead code: this depends on the program using it.

// famous-helpers-lib/index.js
if (process.env.MODE === "development") {
  initCustomDevtools();
}

Notes

Sources