Sakana AI is an artificial intelligence research lab focused on nature-inspired algorithms, and it recently launched an innovative adaptive language model called Transformer² (Transformer-squared). This model can dynamically learn and adapt to new tasks during inference without the need for expensive fine-tuning, marking an important step in the development of large language model (LLM) technology.

The core innovation of Transformer² lies in its unique two-step dynamic weight adjustment mechanism. First, it analyzes incoming user requests to understand task requirements; then, using mathematical techniques, it aligns model weights with task needs through Singular Value Decomposition (SVD). By selectively adjusting key components of the model weights, Transformer² can optimize performance in real-time without the time-consuming process of retraining. This stands in stark contrast to traditional fine-tuning methods, which require parameters to remain static after training or utilize methods like Low-Rank Adaptation (LoRA) that only modify a small portion of parameters.

QQ20250124-104642.png

Transformer Squared Training and Inference (Source: arXiv)

To achieve dynamic adjustment, researchers employed a method called Singular Value Fine-tuning (SVF). During training, SVF learns a set of skill representations known as z-vectors from the SVD components of the model. During inference, Transformer² determines the required skills by analyzing prompts and then configures the corresponding z-vectors, enabling tailored responses for each prompt.

Test results show that Transformer² outperforms LoRA models across various tasks, including mathematics, coding, reasoning, and visual question answering, while using fewer parameters. Even more notably, this model possesses knowledge transfer capabilities, meaning that z-vectors learned from one model can be applied to another, indicating its potential for widespread application.

QQ20250124-104627.png

Comparison of Transformer-squared (SVF in the table) with base models and LoRA (Source: arXiv)

Sakana AI has released the training code for the components of Transformer² on its GitHub page, opening the door for other researchers and developers.

As businesses continue to explore the applications of LLMs, custom techniques during inference are gradually becoming mainstream. Together with other technologies like Google's Titans, Transformer² is changing the way LLMs are applied, allowing users to dynamically adjust models according to their specific needs without the need for retraining. This advancement in technology will make LLMs more useful and practical across a broader range of fields.

Researchers at Sakana AI state that Transformer² represents a bridge between static artificial intelligence and living intelligence, laying the foundation for efficient, personalized, and fully integrated AI tools.