Other articles


  1. MNIST-SAE Log 13

    Work Done

    Confusions

    • The loss is pretty stagnant, will this improve performance for the MNIST?

    Next Steps

    • Train the model on EMNIST rather than MNIST to get more data to actually run the SAEs

    • Message Bart Bussman for repo for meta-sae's https://www.alignmentforum.org/inbox/quDmw96SJdzJk8yS3 --> also ask about …

    read more
  2. MNIST-SAE Log 11

    Work Done

    • Established rudimentary interp via EMNIST classification labels
    • Fixed up some of the experiment code to sagve the indices
    • got preliminary findings that there is a optimal depth for meta-saes

    • Found that the average amount of activations do increase the deeper you go which implies some level of fine-grained-ness …

    read more
  3. MNIST-SAE Log 9

    Work Done

    • set up initial loop for infinite saes

    Confusions

    • analysis how?
    • should I pivot to ... EMNIST? or another mnist dataset

    Next Steps

    • DO 1 epoch for testing purposes

    • look into the normalization values for the transforms of emnist and mnist

    • finish that loop
    • clear out activations
    • analyze the activations …
    read more
  4. MNIST-SAE Log 8

    Work Done

    • Loaded in EMNIST dataset and ran an SAE + meta-sae on it.

    Confusions

    • how do I measure the impact of continuous meta-saes?
      • run MNIST dataset and see if there are more distinct pattenrs that emerge over time

    Next Steps

    • Graph the activation distribution for the sae vs the meta-sae …
    read more
  5. MNIST-SAE Log 7

    Work Done

    • fixed sae shape bug
    • ran visualization code on meta-sae
    • added EMNIST dataset

    Confusions

    • how could I measure what changes as you get more meta?
      • some basic ideas: (LOOK AT THE PAPER!!!)
        • (1) track the amount/trend of nonzero activations
        • (2) track the distribution of nonzero activations (ie are …
    read more
  6. MNIST-SAE Log 6

    Work Done

    • cleaned up training pipeline for mnist saes
    • fixed dimension error where I was passing in the wrong channle (3 channel as opposed to 1) and fixed it so the MNIST was 3 channel input
      • standardization of images
    • added caching code to the EnhancedSAE since it seems to perform …
    read more
  7. MNIST-SAE Log 4

    Work Done

    • Loaded in CIFAR-10 data
    • ran it through MNIST + the first sae
    • cached all the activations
    • found max activating images

    Confusions

    • struggling to understand does the dataset havfe a constat image that it pulls from or is it randomized every time?
      • need to validate that the images are not …
    read more
  8. ViT-SAE Log 4

    Work Done

    Confusions

    • should I train on mlp or logistic regression?
      • does it matter?

    Next Steps

    • have a pipeline up and running to train a probe on the embeddings …
    read more
  9. ViT-SAE Log 3

    Work Done

    • loaded in the dsprites dataset + got it preprocessed properly
    • created the probes
    • refactored a bit of the dataset and model structs

    Confusions

    • What is a linear probe? Like actually.
    • what is a good way to manage storage?

    Next Steps

    • Need to make the full data pipeline. dsprites data …
    read more
  10. ViT-SAE Log 2

    Background

    Devinterp is the field that seeks to understand what leads to the emergence of model properties during training. I think it would be cool to use linear probes to examine when and where meaningful features emerge in a model.

    Work Done

    • created some basic training code for a vit …
    read more
  11. A Set of Questions about... Anything & Everything

    Mech Interp

    1. Can you unembed from any point in the model?
      • I think not because the dimensions are not constant
      • but would it work for any point in the resid stream
    2. How do we measure if a feature is monosemantic?
    3. What if I use SAE activations for the steering vectors …
    read more
  12. MNIST-SAE Log 3

    Goal: Figure out what features (as in curves, numbers, thoughts, ideas) are inside MNIST. Want to break it apart akin to breaking apart a transformer with an SAE (visualizations and stuff)

    Work Done

    • analyzed MNIST through generating maximally activating images for each neuron in layer fc2
    • looked at SAE reconstructions …
    read more
  13. MNIST-SAE Log 2

    Work Done

    • Fixed SAE training + testing code
    • Analyzed max activations of mnist, sae, and meta-sae

    CODE

    Confusions

    • What else can I do besides max activations to track features + latents?
    • What happens when I do the decoder instead of the encoder? Can I take something out of the decoder or is …
    read more
  14. MNIST-SAE Log 1

    Background

    Sparse Autoencoders are a way to understand the internal structure of model computations. The principle is that you take the activations of a model, train an SAE on it and then do max activation tracking to see what neurons in the sae fire the most on some set of …

    read more
  15. Who am I

    Email: jain18ayush@gmail.com --> send book recs my way!

    Github: github.com/jain18ayush/

    About Me

    • Computer Scientist trying to be a philosopher
    • Avid reader (far too much): recs linked here
    • Enjoys talking about useless things like what is the nature of a chair
    • Founding Engineer at Onespace
    read more

links

social