Other articles

MNIST-SAE Log 13
Published: Sun 29 December 2024
By Ayush Jain

In Experiment Logs.

Work Done
Confusions
- The loss is pretty stagnant, will this improve performance for the MNIST?
Next Steps
- Train the model on EMNIST rather than MNIST to get more data to actually run the SAEs
- Message Bart Bussman for repo for meta-sae's https://www.alignmentforum.org/inbox/quDmw96SJdzJk8yS3 --> also ask about …
read more
MNIST-SAE Log 12
Published: Tue 10 December 2024
By Ayush Jain

In Experiment Logs.

Work Done
- Determined a way to measure feature granularity / overlap as we go down
- Separated and cleaned up embedding generation
- Claude Project Feature Granularity
Confusions
- What does each metric I made mean?
- Do they actually measure how broad each concept becomes
Next Steps
- Find the most similar vectors from one …
read more
MNIST-SAE Log 11
Published: Sat 09 November 2024
By Ayush Jain

In Experiment Logs.

Work Done
- Established rudimentary interp via EMNIST classification labels
- Fixed up some of the experiment code to sagve the indices
- got preliminary findings that there is a optimal depth for meta-saes
- Found that the average amount of activations do increase the deeper you go which implies some level of fine-grained-ness …
read more
MNIST-SAE Log 10
Published: Mon 04 November 2024
By Ayush Jain

In Experiment Logs.

Work Done
- set up loop for generating images for all the depths
- started looking into the Meta-SAE Paper. This is the chatgpt partial explanation
Confusions
- what is a meta-sae latent
- what is an sae latent
- Why do the activation trends decrease over time.
Next Steps
- WRITE UP A Document compiled …
read more
MNIST-SAE Log 9
Published: Sun 03 November 2024
By Ayush Jain

In Experiment Logs.

Work Done
- set up initial loop for infinite saes
Confusions
- analysis how?
- should I pivot to ... EMNIST? or another mnist dataset
Next Steps
- DO 1 epoch for testing purposes
- look into the normalization values for the transforms of emnist and mnist
- finish that loop
- clear out activations
- analyze the activations …
read more
MNIST-SAE Log 8
Published: Sun 03 November 2024
By Ayush Jain

In Experiment Logs.

Work Done
- Loaded in EMNIST dataset and ran an SAE + meta-sae on it.
Confusions
- how do I measure the impact of continuous meta-saes?
  - run MNIST dataset and see if there are more distinct pattenrs that emerge over time
Next Steps
- Graph the activation distribution for the sae vs the meta-sae …
read more
MNIST-SAE Log 7
Published: Fri 01 November 2024
By Ayush Jain

In Experiment Logs.

Work Done
- fixed sae shape bug
- ran visualization code on meta-sae
- added EMNIST dataset
Confusions
- how could I measure what changes as you get more meta?
  - some basic ideas: (LOOK AT THE PAPER!!!)
    
    (1) track the amount/trend of nonzero activations
    
    (2) track the distribution of nonzero activations (ie are …
read more
MNIST-SAE Log 6
Published: Thu 31 October 2024
By Ayush Jain

In Experiment Logs.

Work Done
- cleaned up training pipeline for mnist saes
- fixed dimension error where I was passing in the wrong channle (3 channel as opposed to 1) and fixed it so the MNIST was 3 channel input
  - standardization of images
- added caching code to the EnhancedSAE since it seems to perform …
read more
MNIST-SAE Log 5
Published: Wed 30 October 2024
By Ayush Jain

In Experiment Logs.

Work Done
- found a bug in the caching code that resulted in incorrect caching (ie getting 150k samples as opposed to 50k inputted)
Confusions
- why is the bug happening
Next Steps
- fix the bug
Cool Stuff

VLM - openbmb/MiniCPM-V-2 Pytorch Datasets

Note: Wrote this on the 31st since forgot to …
read more
MNIST-SAE Log 4
Published: Tue 29 October 2024
By Ayush Jain

In Experiment Logs.

Work Done
- Loaded in CIFAR-10 data
- ran it through MNIST + the first sae
- cached all the activations
- found max activating images
Confusions
- struggling to understand does the dataset havfe a constat image that it pulls from or is it randomized every time?
  - need to validate that the images are not …
read more
ViT-SAE Log 4
Published: Tue 22 October 2024
By Ayush Jain

In Experiment Logs.

Work Done
- Loaded in data by label for the dsprites data
- attempted to start linear probe training
- looked at refactoring + design pattern stuff
Confusions
- should I train on mlp or logistic regression?
  - does it matter?
Next Steps
- have a pipeline up and running to train a probe on the embeddings …
read more
ViT-SAE Log 3
Published: Mon 21 October 2024
By Ayush Jain

In Experiment Logs.

Work Done
- loaded in the dsprites dataset + got it preprocessed properly
- created the probes
- refactored a bit of the dataset and model structs
Confusions
- What is a linear probe? Like actually.
- what is a good way to manage storage?
Next Steps
- Need to make the full data pipeline. dsprites data …
read more
ViT-SAE Log 2
Published: Sun 20 October 2024
By Ayush Jain

In Experiment Logs.

Background

Devinterp is the field that seeks to understand what leads to the emergence of model properties during training. I think it would be cool to use linear probes to examine when and where meaningful features emerge in a model.

Work Done
- created some basic training code for a vit …
read more
ViT-SAE Log 1
Published: Tue 15 October 2024
By Ayush Jain

In Experiment Logs.

Background

Attempting to create Meta-SAEs on a classification Vit. The hope is that this gives more access to fine-grained features.

Meta-SAE ViT-SAE Training

State of Vision Interp

Work Done
- Wrote some boilerplate Vit and training it currently (it is based off of a google one so may not have 'pure …
read more
A Set of Questions about... Anything & Everything
Published: Sun 13 October 2024
By Ayush Jain

In Personal.

Mech Interp
1. Can you unembed from any point in the model?
  - I think not because the dimensions are not constant
  - but would it work for any point in the resid stream
2. How do we measure if a feature is monosemantic?
3. What if I use SAE activations for the steering vectors …
read more
MNIST-SAE Log 3
Published: Fri 11 October 2024
By Ayush Jain

In Experiment Logs.

Goal: Figure out what features (as in curves, numbers, thoughts, ideas) are inside MNIST. Want to break it apart akin to breaking apart a transformer with an SAE (visualizations and stuff)

Work Done
- analyzed MNIST through generating maximally activating images for each neuron in layer fc2
- looked at SAE reconstructions …
read more
MNIST-SAE Log 2
Published: Thu 10 October 2024
By Ayush Jain

In Experiment Logs.

Work Done
- Fixed SAE training + testing code
- Analyzed max activations of mnist, sae, and meta-sae
CODE

Confusions
- What else can I do besides max activations to track features + latents?
- What happens when I do the decoder instead of the encoder? Can I take something out of the decoder or is …
read more
MNIST-SAE Log 1

Published: Wed 09 October 2024
By Ayush Jain

In Experiment Logs.

Background

Sparse Autoencoders are a way to understand the internal structure of model computations. The principle is that you take the activations of a model, train an SAE on it and then do max activation tracking to see what neurons in the sae fire the most on some set of …
read more
Who am I
Published: Wed 09 October 2024
By Ayush Jain

In Personal.

Email: jain18ayush@gmail.com --> send book recs my way!

Github: github.com/jain18ayush/

About Me
- Computer Scientist trying to be a philosopher
- Avid reader (far too much): recs linked here
- Enjoys talking about useless things like what is the nature of a chair
- Founding Engineer at Onespace
read more

Other articles

Work Done

Confusions

Next Steps

Work Done

Confusions

Next Steps

Work Done

Work Done

Confusions

Next Steps

Work Done

Confusions

Next Steps

Work Done

Confusions

Next Steps

Work Done

Confusions

Work Done

Work Done

Confusions

Next Steps

Cool Stuff

Note: Wrote this on the 31st since forgot to …

Work Done

Confusions

Work Done

Confusions

Next Steps

Work Done

Confusions

Next Steps

Background

Work Done

Background

Work Done

Mech Interp

Goal: Figure out what features (as in curves, numbers, thoughts, ideas) are inside MNIST. Want to break it apart akin to breaking apart a transformer with an SAE (visualizations and stuff)

Work Done

Work Done

Confusions

Background

Email: jain18ayush@gmail.com --> send book recs my way!

Github: github.com/jain18ayush/

About Me

links

social