A Set of Questions about... Anything & Everything

Mech Interp

  1. Can you unembed from any point in the model?
    • I think not because the dimensions are not constant
    • but would it work for any point in the resid stream
  2. How do we measure if a feature is monosemantic?
  3. What if I use SAE activations for the steering vectors I am trying to transfer from one model to another? Like this tutorial

links

social