Toy Model of Superposition
Based on
Toy Models of Superposition
by Anthropic •
Original Colab
Presets
Linear w/ High Sparsity
n=20, m=5 • 1−S=0.001 • Linear
ReLU w/ Dense
n=20, m=5 • 1−S=1 • ReLU
ReLU w/ High Sparsity
n=20, m=5 • 1−S=0.001 • ReLU
Model Parameters
Input Dimension (n):
20
Hidden Dimension (m):
5
Feature Probability (1−S):
1
0.3
0.1
0.03
0.01
0.003
0.001
Probability that any given input feature is non-zero
Feature Importance Decay:
0.70
Controls how quickly importance decreases from feature 1 to n
Output Activation Mode:
Linear
ReLU
Train Model
Stop Training
More Settings
Learning Rate:
Batch Size:
Training Steps:
Learning Rate Schedule:
Constant
Linear Decay
Cosine Decay
Optimizer: AdamW (β₁=0.9, β₂=0.999, weight decay=0.0001)
Weight Initialization: Xavier/Glorot Normal
Training Status
Step:
0
Loss:
-
Status:
Ready
Feature Importances
Visualizations
Weight Matrix W^T W and Bias Terms
Feature Superposition
Training Loss