SouthEast Linux Fest 2026

Cracking the Black Box: Using Sparse Autoencoders to Decode What Language Models Really Learn
2026-06-13 , Salon B (Rocky Linux Ballroom)

Sparse autoencoders reveal how large language models encode meaning beyond individual neurons, extracting interpretable features—from DNA and code errors to multilingual concepts—and advancing practical mechanistic interpretability.


An overview of sparse autoencoders as a tool for decoding hidden representations in language models, showing how they uncover meaningful, human-interpretable features and circuits.