Superposition demo — Why do LLMs work?

What you should see

p ≈ 0.5 (dense): only 2 directions remain non-zero — the model represents the 2 most-important features and drops the rest.
p ≈ 0.1 (moderate): about 3 directions in a triangle. The model superposes more features as redundancy goes down.
p ≈ 0.02 (very sparse): up to n directions in a regular polygon. The 2-D bottleneck packs all features.

The geometry — digon, triangle, square, pentagon, hexagon — is what Elhage et al. (2022) reported and analytically explained. The demo is a teaching artifact; see notebook 02 for the Python version and the programme 02 file for the full evidence ledger.

What this does not show

A synthetic, i.i.d., Bernoulli-sparse task is the cleanest possible setting for superposition. Real LMs face correlated features, hierarchical structure, and orders of magnitude more features per dimension. What survives at LM scale is the phenomenon of overcomplete representation — not the specific polygon geometry. The empirical evidence for that at LM scale is the SAE literature (Cunningham et al. 2023; Bricken et al. 2023; Gao et al. 2024), not this demo.

See n features pack into d=2 dimensions.

What you should see

What this does not show