MNIST-SAE Log 10

Work Done

Confusions

  • what is a meta-sae latent
  • what is an sae latent

  • Why do the activation trends decrease over time.

Next Steps

  • WRITE UP A Document compiled what I have found --> communicate it well. Does not matter if it means nothing just have it out there.
    • (1) Look over every log I ever wrote and compile the next steps learnings questions etc.
      • Can also write a script to do it but it probably takes away from the reflection component.
    • (2) Try to answer some of the questions
    • (3) Create graphs to show data and pick out interesting ones
  • Post that document here? Maybe on lesswrong.

  • train meta-saes on different dictionary sizes (maybe double the deeper you go)

  • auto-interp : pretty useful for EMNIST or MNIST since it could just find the ones.
    • alternatively we can use our EMNIST classifier to take the images and get the predictions on each one.
    • Can use a different dataset since we have CIFAR100 but those are categories so not as fine-tuned? Probably good enough though.
    • if there is a neuron that maximizes on a specific class then that could be interesting
    • Might have to regenerate the images to be easier to split or have them generated separately into folders?
    • Hugo Fry's Github has feature activation analysis stuff which could be useful if I ask gpt/claude about how to use it for my code

links

social