AIbase
Product LibraryTool Navigation

steering-vectors-from-finetuning

Public

Exploration of an alternative approach to extracting steering vectors. Instead of using the classical contrastive method we investigate whether comparing activations between a base model and its fine-tuned deceptive version reveals a more meaningful latent direction.

Creat2025-02-02T03:25:21
Update2025-02-18T01:51:07
1
Stars
0
Stars Increase