scaling-monosemanticity-llama
PublicReproducing Scaling Monosemanticity: Extracting Interpretable Features from Claude 3 Sonnet using LLaMA. This project explores monosemantic neurons in large language models, implementing and extending methods to scale and analyze interpretability in LLaMA-based architectures.