Background
Attempting to create Meta-SAEs on a classification Vit. The hope is that this gives more access to fine-grained features.
Work Done
- Wrote some boilerplate Vit and training it currently (it is based off of a google one so may not have 'pure?' results ). I guess purity in this sense is just wondering about the dataset used to train teh other. Though it should not matter on a representation level since dataset has a set of images.But if there are completely different images we will not be able to capture it.
- installed vit-prisma and attempted to use its vit training code
- gave up on MNIST sae since could not figure out how to parse the features --> lack of data (just 9 categories)
Confusions
- How do they eval
- Which layer is good
- Do i Need better saes
- Will I run into the same data issue with these SAEs ?
Next Steps
- Read State of Vision Interp + other background resources
- Train a ViT
- Train an SAE on the Vit
- Follow others in evaluating the SAE