AI News

Don't miss any moment of global AI innovation

AI Daily

Daily three-minute AI industry trends

AI Timeline

AI industry milestones

Al Hardware

Lists all AI hardware products.

AI Monetization Guide

Latest Cases

AI monetization case sharing

Image Collection

AI image creation monetization cases

Video Collection

AI video creation monetization cases

Audio Collection

AI audio creation monetization cases

Content Collection

AI content writing monetization cases

AI Tutorials

Latest Tutorials

Free sharing of the latest AI tutorials

AI Product Rankings

AI Product Ranking

Shows total visits ranking of AI websites

AI Traffic Growth Ranking

Track fastest growing AI websites by traffic

AI Traffic Decline Ranking

Focus on AI websites with significant traffic drops

AI Weekly Ranking

Shows weekly visits ranking of AI websites

Popular Country Rankings

United States

AI websites most popular with US users

China

AI websites most popular with Chinese users

India

AI websites most popular with Indian users

Brazil

AI websites most popular with Brazilian users

Popular Category Rankings

Image Generation

Total visits ranking of AI image generation websites

Personal Assistant

Total visits ranking of AI personal assistant websites

Character Generation

Total visits ranking of AI character generation websites

Video Generation

Total visits ranking of AI video generation websites

Popular Open Source Data Rankings

AI Project Ranking

GitHub popular AI projects by total stars

AI Project Growth Ranking

GitHub popular AI projects by growth rate

AI Developer Ranking

GitHub popular AI developer ranking

AI Organization Ranking

GitHub popular AI organization ranking

Popular Open Source Categories

Deepseek

GitHub popular deepseek open source projects

TTS

GitHub popular TTS open source projects

LLM

GitHub popular LLM open source projects

ChatGPT

GitHub popular ChatGPT open source projects

AI Open Source Project Library

Overview

Overview of GitHub popular AI open source projects

Product Library Tool Navigation MCP

Sakana AI's Transformer² Model Breaks LLM Limitations, Achieving Dynamic Inference

AIbase基地

Published inAI News · 5 min read · Jan 24, 2025

194

Sakana AI is an artificial intelligence research lab focused on nature-inspired algorithms, and it recently launched an innovative adaptive language model called Transformer² (Transformer-squared). This model can dynamically learn and adapt to new tasks during inference without the need for expensive fine-tuning, marking an important step in the development of large language model (LLM) technology.

The core innovation of Transformer² lies in its unique two-step dynamic weight adjustment mechanism. First, it analyzes incoming user requests to understand task requirements; then, using mathematical techniques, it aligns model weights with task needs through Singular Value Decomposition (SVD). By selectively adjusting key components of the model weights, Transformer² can optimize performance in real-time without the time-consuming process of retraining. This stands in stark contrast to traditional fine-tuning methods, which require parameters to remain static after training or utilize methods like Low-Rank Adaptation (LoRA) that only modify a small portion of parameters.

Transformer Squared Training and Inference (Source: arXiv)

To achieve dynamic adjustment, researchers employed a method called Singular Value Fine-tuning (SVF). During training, SVF learns a set of skill representations known as z-vectors from the SVD components of the model. During inference, Transformer² determines the required skills by analyzing prompts and then configures the corresponding z-vectors, enabling tailored responses for each prompt.

Test results show that Transformer² outperforms LoRA models across various tasks, including mathematics, coding, reasoning, and visual question answering, while using fewer parameters. Even more notably, this model possesses knowledge transfer capabilities, meaning that z-vectors learned from one model can be applied to another, indicating its potential for widespread application.

Comparison of Transformer-squared (SVF in the table) with base models and LoRA (Source: arXiv)

Sakana AI has released the training code for the components of Transformer² on its GitHub page, opening the door for other researchers and developers.

As businesses continue to explore the applications of LLMs, custom techniques during inference are gradually becoming mainstream. Together with other technologies like Google's Titans, Transformer² is changing the way LLMs are applied, allowing users to dynamically adjust models according to their specific needs without the need for retraining. This advancement in technology will make LLMs more useful and practical across a broader range of fields.

Researchers at Sakana AI state that Transformer² represents a bridge between static artificial intelligence and living intelligence, laying the foundation for efficient, personalized, and fully integrated AI tools.

SakanaAI Transformer² LargeLanguageModel AnomalyDetection

This article is from AIbase Daily

Welcome to the [AI Daily] column! This is your daily guide to exploring the world of artificial intelligence. Every day, we present you with hot topics in the AI field, focusing on developers, helping you understand technical trends, and learning about innovative AI product applications.

—— Created by the AIbase Daily Team

AI News Recommendations

Memory Optimization! NVIDIA DLSS 4 Makes Games Smoother, Reducing VRAM by 20% with Transformer Model

Jun 30, 2025

100

MLX-LM Seamlessly Integrated with Hugging Face to Boost Efficient Large Language Model Performance on Apple Silicon Devices

May 20, 2025

660

ByteDance Unveils QuaDMix: A Unified Framework for Large Language Model Pre-training Data Quality and Diversity

Apr 28, 2025

820

Zhipu AI and Shengshu Technology Announce Strategic Partnership to Focus on Large Model Joint Innovation

On April 27, Zhipu AI (Z.ai) and Shengshu Technology (shengshu.com), two leading artificial intelligence companies under Tsinghua University, announced a major strategic partnership. This collaboration aims to leverage both companies' technological expertise in large language models and multi-modal generative models to jointly advance the technological innovation and industrial application of domestic large models.

Apr 27, 2025

500

Doubao 1.5 Deep Thinking Model Launches on Edge Large Model Gateway with Free Million Tokens

Bytedance's Volcano Engine announced the full launch of its newly released Doubao 1.5 Deep Thinking model on the edge large model gateway, offering users up to 5 million free tokens. This move has garnered significant attention in the AI field.

Apr 25, 2025

1.7k

GPT-4.1 Model Faces Scrutiny: Alignment and Stability Concerns Raised

Apr 24, 2025

850

ByteDance Releases Efficient Pre-training Length Scaling Technology, Breaking Through Long Sequence Training Bottlenecks

Apr 23, 2025

900

Revolutionizing Video Creation! Alibaba's VACE Model Unifies Text, Image, and Video Inputs

Scientists at Alibaba Group have introduced VACE, a universal AI model designed to unify a wide range of video generation and editing tasks. At the heart of VACE is an enhanced Diffusion Transformer architecture, innovating with a novel input format called "Video Conditional Unit" (VCU). VCU distills diverse modalities such as text prompts, reference images or video sequences, and spatial masks into a unified representation, and through a specialized mechanism coordinates different inputs to avoid conflicts. Concept decoupling enables fine-grained control.

Apr 23, 2025

550

MAGI-1, the World's First Autoregressive Video Generation Model, Officially Launched; Swin Transformer Team Leads a New Wave in Video Creation

A powerful new contender has emerged in the field of video generation—MAGI-1. Developed by Sand AI, a startup led by Cao Yue, winner of the Marr Prize and Tsinghua University's Special Award, this autoregressive video generation model is redefining the possibilities of video creation. MAGI-1 generates videos by predicting sequences of video blocks, garnering significant attention for its natural and fluid results and multiple downloadable versions. MAGI-1 boasts superior performance in video generation. Firstly, it delivers a seamless and smooth video experience, capable of generating...

Apr 22, 2025

3.4k

Samsung Research Unveils Novel Autoregressive Transformer for High-Resolution Image Generation

Apr 22, 2025

360