Programme 02 · Superposition

See n features pack into d=2 dimensions.

A live, in-browser reproduction of the Elhage-et-al-2022 toy model of superposition: a tiny two-layer network trains on sparse synthetic features with a bottleneck of d=2, and we watch the bottleneck columns arrange themselves into regular-polygon configurations as a function of sparsity. No install, no Python — just gradient descent in your tab.

369
sparsep = 0.020dense
20015003000

Loss:

Step: 0

Each arrow is one column of the encoder W (one direction the model uses to represent one feature). At high sparsity (slider left) the model packs all n features into the d=2 bottleneck as a regular polygon. At low sparsity (slider right), it gives up: only the top-2 features get represented, the rest collapse to zero.


What you should see

The geometry — digon, triangle, square, pentagon, hexagon — is what Elhage et al. (2022) reported and analytically explained. The demo is a teaching artifact; see notebook 02 for the Python version and the programme 02 file for the full evidence ledger.

What this does not show

A synthetic, i.i.d., Bernoulli-sparse task is the cleanest possible setting for superposition. Real LMs face correlated features, hierarchical structure, and orders of magnitude more features per dimension. What survives at LM scale is the phenomenon of overcomplete representation — not the specific polygon geometry. The empirical evidence for that at LM scale is the SAE literature (Cunningham et al. 2023; Bricken et al. 2023; Gao et al. 2024), not this demo.