DeepMind Points Out that Transformers Cannot Generalize Beyond the Range of Pre-training Data

机器之心

Published inAI News · 1 min read · Nov 7, 2023

Translated data: The Transformer large language model demonstrates the ability to learn from a few examples by providing contextual samples. However, researchers at DeepMind have discovered that the Transformer fails to generalize beyond the scope of its pre-training data. Through empirical studies, researchers have explored the generalization issues of the Transformer models and found that the model selection capability imposes certain limitations on its generalization ability.

Transformer Generalization Pre-training

This article is from AIbase Daily

Welcome to the [AI Daily] column! This is your daily guide to exploring the world of artificial intelligence. Every day, we present you with hot topics in the AI field, focusing on developers, helping you understand technical trends, and learning about innovative AI product applications.

—— Created by the AIbase Daily Team

AI News Recommendations

Tencent's HunYuan InstantCharacter Open-Sourced: High Character Consistency, Customizable Poses, Styles, and Scenes

Apr 18, 2025

240

Yao Class Top Student, Shunyu Yao of OpenAI: AI Development Shifts from Model Innovation to Product Thinking

Apr 17, 2025

340

Pre-training Doesn't Equal Stronger: Research Reveals Catastrophic Overfitting in Large Language Models

Apr 14, 2025

190

AI Video Generation Technology TTT: Generates One-Minute Complete Tom and Jerry Animations Directly, No Editing or Splicing Needed

A new research paper titled "One-Minute Video Generation with Test-Time Training" has been released, marking a significant advancement in AI video generation technology. This research successfully generates one-minute Tom and Jerry animations by introducing an innovative Test-Time Training (TTT) layer into a pre-trained Transformer model.

Apr 9, 2025

1.2k

EasyControl: Empowering DiT Models with ControlNet-like Capabilities, Including Ghibli Style Transfer

In the field of AI art generation, diffusion models are transitioning from U-Net based architectures to Transformer-based architectures (DiT). However, the DiT ecosystem faces challenges in plugin support, efficiency, and multi-conditional control. Recently, a team led by Xiaojiu-z introduced EasyControl, an innovative framework designed to provide efficient and flexible conditional control capabilities for DiT models, effectively giving DiT models the power of ControlNet.

Apr 7, 2025

460

NVIDIA AI Researchers Introduce FFN Fusion Technology: Accelerating Large Language Model Inference

Mar 31, 2025

550

Berkeley Unveils TULIP: A Breakthrough in Vision-Language AI, Significantly Outperforming Existing Technologies

Researchers at the University of California, Berkeley have recently released their latest research achievement – the TULIP (Towards Unified Language-Image Pretraining) model. This model aims to enhance the performance of vision-language pre-training, particularly in visually-centric tasks requiring high-fidelity understanding, overcoming the limitations of existing contrastive learning models (such as CLIP). TULIP integrates innovative techniques such as generative data augmentation, enhanced contrastive learning, and reconstruction regularization.

Mar 24, 2025

240

Tencent Releases the Official Version of HunYuan-T1 Large Language Model with Significantly Enhanced Reasoning Capabilities

Tencent recently released the official version of its HunYuan large language model series – HunYuan-T1. This new model, built upon the medium-scale HunYuan base model and extensively fine-tuned, demonstrates significantly improved reasoning capabilities, particularly excelling in deep thinking and complex problem-solving. Since the launch of the HunYuan T1-Preview in February, users have experienced faster and more insightful processing. The official release marks a significant upgrade to the product line. The HunYuan-T1 development team leveraged the latest Turbo...

Mar 24, 2025

480

Moore Threads Open-Sources Two Major AI Frameworks, Achieving Over 90% Training Efficiency on Domestic GPUs

Mar 18, 2025

210

Challenging Conventions: A Breakthrough Transformer Architecture Without Normalization Layers

In the field of deep learning, normalization layers are considered an indispensable component of modern neural networks. Recently, research led by Meta FAIR research scientist Zhuang Liu, titled "Transformer without Normalization Layers", has garnered significant attention. This research not only introduces a new technique called Dynamic Tanh (DyT), but also demonstrates the effectiveness of Transformer architectures even without traditional normalization layers.

Mar 14, 2025

660

AI News

AI Daily

AI Timeline

Al Hardware

Latest Cases

Image Collection

Video Collection

Audio Collection

Content Collection

Latest Tutorials

AI Product Ranking

AI Traffic Growth Ranking

AI Traffic Decline Ranking

AI Weekly Ranking

United States

China

India

Brazil

Image Generation

Personal Assistant

Character Generation

Video Generation

AI Project Ranking

AI Project Growth Ranking

AI Developer Ranking

AI Organization Ranking

Deepseek

TTS

LLM

ChatGPT

Overview