New Interpretation of Papers! Revealing the Working Mechanism of Transformer Intermediate Layers Using 'Painter Pipeline'

AIbase基地

Published inAI News · 5 min read · Aug 8, 2024

153

In the realm of artificial intelligence, there exists a unique group of "painters"—the hierarchical structures within Transformer models. They are akin to magical brushes, painting a rich and diverse world on the canvas of language. Recently, a paper titled "Transformer Layers as Painters" has provided a new perspective on understanding the working mechanisms of the intermediate layers in Transformers.

As the most popular large-scale language model currently, the Transformer model boasts billions of parameters. Each layer, like a painter, collaborates to complete a grand linguistic painting. But how do these "painters" work together? What distinguishes their "brushes" and "paints"? This paper attempts to answer these questions.

To explore the working principles of Transformer layers, the authors designed a series of experiments, including skipping certain layers, changing the order of layers, or running layers in parallel. These experiments are akin to setting different painting rules for the "painters" to see if they can adapt.

In the metaphor of a "painter's assembly line," the input is viewed as a canvas, and the process through the intermediate layers is like the canvas moving along the assembly line. Each "painter," or each layer of the Transformer, modifies the painting according to its expertise. This analogy helps us understand the parallelism and adjustability of Transformer layers.

The experiments used two pre-trained large language models (LLMs): Llama2-7B and BERT. The study found that the "painters" in the intermediate layers seem to share a common "paintbox"—the representation space, which differs from the first and last layers. Skipping certain intermediate "painters" had little impact on the overall painting, indicating that not all "painters" are essential.

Although the intermediate "painters" use the same "paintbox," they depict different patterns on the canvas with their unique skills. Simply repeating the techniques of a particular "painter" would diminish the original charm of the painting.

For tasks requiring rigorous logic, such as mathematics and reasoning, the order of "painting" is particularly important. For tasks relying on semantic understanding, the impact of order is relatively minor.

The research results indicate that the intermediate layers of Transformers exhibit a degree of consistency but are not redundant. For mathematical and reasoning tasks, the order of layers is more important than for semantic tasks.

The study also found that not all layers are necessary; intermediate layers can be skipped without catastrophically affecting model performance. Additionally, although intermediate layers share the same representation space, they perform different functions. Changing the execution order of layers leads to a decline in performance, indicating that order significantly impacts model performance.

In the quest to explore and optimize Transformer models, many researchers are attempting various methods, including pruning and reducing parameters. These efforts provide valuable experiences and insights into understanding Transformer models.

Paper link: https://arxiv.org/pdf/2407.09298v1

Transformer Artificial Intelligence Language Models Painter

This article is from AIbase Daily

Welcome to the [AI Daily] column! This is your daily guide to exploring the world of artificial intelligence. Every day, we present you with hot topics in the AI field, focusing on developers, helping you understand technical trends, and learning about innovative AI product applications.

—— Created by the AIbase Daily Team

AI News Recommendations

Uncovering the Secrets of Large Models! The 'Thinking Words' Behind Them Contain Astonishing Information

Recently, a research team from Renmin University, Shanghai Artificial Intelligence Laboratory, University College London, and Dalian University of Technology revealed an important finding in the reasoning process of large models: when the model is thinking, the 'thinking words' it uses actually reflect a significant increase in its internal information. This research result provides a new perspective for better understanding the reasoning mechanisms of artificial intelligence through methods of information theory. You may have seen large models output some language that seems human-like when answering questions, such as "Hmm..." or "Let me think...".

Jul 4, 2025

DeepMind introduces Crome: Enhancing the Alignment of Large Language Models with Human Feedback

In the field of artificial intelligence, reward models are a critical component for aligning large language models (LLMs) with human feedback, but existing models face the issue of "reward hacking." These models often focus on superficial features, such as the length or format of responses, rather than identifying genuine quality metrics, such as factual accuracy and relevance. The root cause lies in standard training objectives failing to distinguish between spurious associations and true causal drivers present in the training data. This failure leads to fragile reward models (RMs), which generate misaligned policies.

Jul 4, 2025

220

China's Medical Large Model Release Volume Accounts for 70% of the Global Total! KPMG Reveals Future Market Potential

According to KPMG China's recent report, "The First 50 Health Tech Companies," China accounts for more than 70% of the global release volume of medical large models. This data not only demonstrates China's rapid development in the field of intelligent healthcare, but also reflects the wide application of large language models in the healthcare industry. The report points out that about 65% of the currently released medical large models are large language models. These models can process and generate natural language, playing a significant supporting role in the analysis of medical data, patient communication, and scientific research.

Jul 4, 2025

100

New Developments in OpenAI Copyright Lawsuit: The New York Times Will Have Access to Deleted User Data

In the long-standing copyright infringement lawsuit filed by The New York Times against OpenAI, the case has made significant progress. According to Ars Technica, the federal judge presiding over the case has authorized The New York Times and its co-plaintiffs, The New York Daily News and the Investigative Reporting Center, to access OpenAI's user logs, including deleted content, to accurately determine the scope of the infringement. The New York Times believes that ChatGPT users may delete their history after bypassing the paywall, and therefore it is necessary to conduct large-scale data collection.

Jul 4, 2025

170

Shortcut Makes Its Debut! AI Excel Assistant Surpasses Human Champions by 10 Times, Task Automation Efficiency Soars

Recently, an AI Excel assistant called Shortcut has sparked heated discussions on social media. It enables users to effortlessly complete Excel tasks without writing complex formulas or VBA code through natural language processing (NLP) technology. The AIbase editorial team has compiled the latest information from social media to provide an in-depth analysis of Shortcut's powerful features and its potential impact on the fields of data processing and financial modeling. Shortcut: An Excel Revolution Driven by Natural Language

Jul 3, 2025

8.5k

KPMG Report: China Leads in Medical Large Models, Accounting for 70% of the Global Total

A recent report titled "Health Tech 50 - The First Edition" released by KPMG China reveals that China has taken a leading position in the field of medical large models globally. The report indicates that the number of medical large models launched in China accounts for more than 70% of the global total, far surpassing other countries and regions. In terms of model categories, large language models (LLMs) are the most numerous, accounting for nearly 65%. Moreover, the report also highlights the strong growth momentum of the intelligent medical devices market in China. It is expected that by 2025, the scale of the intelligent medical devices market in China will reach 24.23 billion yuan, and it will continue to grow.

Jul 3, 2025

200

Topview Avatar 2 Shakes the Market! AI Digital Humans Revolution E-commerce Live Streaming, Will the Era of Models Come to an End?

Jul 3, 2025

430

Perplexity Launches Monthly $200 Max Subscription Service to Unlock Advanced AI Models and Exclusive Features

Jul 3, 2025

120

Exploring the Compatibility of LLMs with Reinforcement Learning: Shanghai Jiao Tong University Reveals Differences Between Llama and Qwen, Introducing OctoThinker

Large Language Models (LLMs) have achieved significant progress in complex reasoning tasks by combining task prompts with large-scale reinforcement learning (RL), as demonstrated by models like Deepseek-R1-Zero, which directly apply reinforcement learning to base models, showcasing strong reasoning capabilities. However, this success is difficult to replicate across different base model families, especially within the Llama series. This raises a core question: what factors lead to inconsistent performance of different base models during reinforcement learning? How does reinforcement learning perform in

Jul 3, 2025

120

Scientists Have Something to Say! SciArena Platform Launches Multi-Dimensional Evaluation of Large Language Models' Scientific Performance

Jul 3, 2025

130

AI News

AI Daily

AI Timeline

Al Hardware

Latest Cases

Image Collection

Video Collection

Audio Collection

Content Collection

Latest Tutorials

AI Product Ranking

AI Traffic Growth Ranking

AI Traffic Decline Ranking

AI Weekly Ranking

United States

China

India

Brazil

Image Generation

Personal Assistant

Character Generation

Video Generation

AI Project Ranking

AI Project Growth Ranking

AI Developer Ranking

AI Organization Ranking

Deepseek

TTS

LLM

ChatGPT

Overview

New Interpretation of Papers! Revealing the Working Mechanism of Transformer Intermediate Layers Using 'Painter Pipeline'

AIbase基地

This article is from AIbase Daily

AI News Recommendations

Uncovering the Secrets of Large Models! The 'Thinking Words' Behind Them Contain Astonishing Information

DeepMind introduces Crome: Enhancing the Alignment of Large Language Models with Human Feedback

China's Medical Large Model Release Volume Accounts for 70% of the Global Total! KPMG Reveals Future Market Potential

New Developments in OpenAI Copyright Lawsuit: The New York Times Will Have Access to Deleted User Data

Shortcut Makes Its Debut! AI Excel Assistant Surpasses Human Champions by 10 Times, Task Automation Efficiency Soars

KPMG Report: China Leads in Medical Large Models, Accounting for 70% of the Global Total

Topview Avatar 2 Shakes the Market! AI Digital Humans Revolution E-commerce Live Streaming, Will the Era of Models Come to an End?

Perplexity Launches Monthly $200 Max Subscription Service to Unlock Advanced AI Models and Exclusive Features

Exploring the Compatibility of LLMs with Reinforcement Learning: Shanghai Jiao Tong University Reveals Differences Between Llama and Qwen, Introducing OctoThinker

Scientists Have Something to Say! SciArena Platform Launches Multi-Dimensional Evaluation of Large Language Models' Scientific Performance