ViT-SAE Log 1

Background

Attempting to create Meta-SAEs on a classification Vit. The hope is that this gives more access to fine-grained features.

Meta-SAE ViT-SAE Training

State of Vision Interp

Work Done

  • Wrote some boilerplate Vit and training it currently (it is based off of a google one so may not have 'pure?' results ). I guess purity in this sense is just wondering about the dataset used to train teh other. Though it should not matter on a representation level since dataset has a set of images.But if there are completely different images we will not be able to capture it.
  • installed vit-prisma and attempted to use its vit training code
  • gave up on MNIST sae since could not figure out how to parse the features --> lack of data (just 9 categories)

Code

Confusions

  • How do they eval
  • Which layer is good
  • Do i Need better saes
  • Will I run into the same data issue with these SAEs ?

Next Steps

GOAL: Reproduce the Vit Prisma SAE training and then run a Meta-SAE on it.

links

social