December 2025
Robustness Analysis of Residual Steering in Gemma-2B
December 2024
On the slow erosion of meaningful work and our unpreparedness for it
December 2024
Exploring multi-trigger classification in mechanistic interpretability
September 2024
Understanding how LLMs encode multiple concepts in overlapping neural representations