ViT-SAE Log 2

Background

Devinterp is the field that seeks to understand what leads to the emergence of model properties during training. I think it would be cool to use linear probes to examine when and where meaningful features emerge in a model.

Work Done

  • created some basic training code for a vit --> might take forever to run

Confusions

  • what can I measure in devinterp
    • vit seems nice because I can train it pretty easily as opposed to gpt-2

Next Steps

GOAL: Reproduce the Vit Prisma SAE training and then run a Meta-SAE on it OR run dev interp experiments to analyze at what point does a certain feature emerge

links

social