Toy Model of Superposition

Based on Toy Models of Superposition by Anthropic • Original Colab

Presets

Linear w/ High Sparsity
n=20, m=5 • 1−S=0.001 • Linear
ReLU w/ Dense
n=20, m=5 • 1−S=1 • ReLU
ReLU w/ High Sparsity
n=20, m=5 • 1−S=0.001 • ReLU

Model Parameters

20
5
Probability that any given input feature is non-zero
0.70
Controls how quickly importance decreases from feature 1 to n
More Settings
Optimizer: AdamW (β₁=0.9, β₂=0.999, weight decay=0.0001)
Weight Initialization: Xavier/Glorot Normal

Training Status

Step: 0
Loss: -
Status: Ready

Feature Importances

Visualizations

Weight Matrix W^T W and Bias Terms

Feature Superposition

Training Loss