Probes and Cons: A Deep Dive into the Latent Geometry of Functional Triggers in Language Models
Published:
An investigation into how language models internally represent abstract functions that can be invoked by diverse, semantically equivalent prompts. This work reveals surprising findings about the geometry of learned concepts in LLMs and the unexpected effects of supervised fine-tuning.