steering-vectors-from-finetuning
PublicExploration of an alternative approach to extracting steering vectors. Instead of using the classical contrastive method we investigate whether comparing activations between a base model and its fine-tuned deceptive version reveals a more meaningful latent direction.