Meta Releases New Memory Layer Technology: Breaking Parameter Limits and Significantly Improving AI Fact Accuracy

AIbase基地

Published inAI News · 8 min read · Jan 6, 2025

874

Meta recently released a groundbreaking research achievement, developing a new type of memory layer technology that significantly enhances the factual accuracy of large language models (LLMs) and achieves unprecedented scalability in parameter size. This technology not only challenges traditional neural network scaling methods but also provides new directions for future AI architecture design.

The core of this research lies in utilizing a trainable key-value lookup mechanism to add extra parameters to the model without increasing the computational load (FLOPs). The central idea of this approach is to supplement the computationally intensive feedforward layers with a sparsely activated memory layer, thereby providing dedicated storage and retrieval capabilities for information.

Compared to traditional dense networks, memory layers are more efficient in handling information storage. For example, language models need to learn simple associative information such as birthdays of people and capitals of countries; memory layers can achieve this through a simple key-value lookup mechanism, which is more efficient than using feedforward networks.

The main contribution of this research is the expansion of the memory layer's scale to an unprecedented level, reaching 128 billion parameters. Experimental results indicate that language models equipped with the improved memory layer outperform dense models that double the computational load and also surpass mixture of experts models when computational and parameter sizes are matched. The performance improvement is especially significant in factual tasks.

Meta's researchers achieved this by replacing one or more feedforward networks (FFNs) in the Transformer network with memory layers. This replacement method consistently shows advantages across different base model sizes (ranging from 134 million to 8 billion parameters) and memory capacities (up to 128 billion parameters). Experimental results demonstrate that memory layers can improve the factual accuracy of language models by over 100%, while also showing significant improvements in code writing and general knowledge. In many cases, models equipped with memory layers can even achieve the performance of dense models that require four times the computational load.

Researchers also made several enhancements to the memory layer to overcome challenges in scaling:

Using a product key lookup mechanism: To address the bottleneck in querying keys in large-scale memory layers, the study adopted trainable product quantization keys, avoiding the need to compare every query key pair.

Parallelizing memory layers: To implement the parallelization of memory layers in a multi-GPU environment, researchers distributed the embedding lookup and aggregation operations across multiple GPUs.

Shared memory mechanism: To maximize parameter sharing, researchers utilized a shared memory parameter pool across all memory layers.

Optimizing performance and stability: Researchers optimized the EmbeddingBag operation using custom CUDA kernels, significantly improving memory bandwidth utilization. Additionally, an input-related gating mechanism with silu non-linearity was introduced to enhance training performance and stability.

Experimental results also revealed the following key findings:

The size of the memory layer significantly affects performance: As the size of the memory layer increases, the performance of factual question answering continuously improves.

Multiple memory layers outperform a single memory layer: Using multiple memory layers with shared parameters can enhance performance, but too many memory layers can degrade it. The optimal number of memory layers is three.

Memory layers learn facts more quickly: In the early stages of training, models equipped with memory layers show faster performance improvements, indicating that memory layers help the model learn facts more quickly.

Memory layers complement dense layers: Experiments show that both sparse memory layers and dense feedforward layers are essential.

To validate the effectiveness of the memory layer technology, researchers evaluated it on multiple benchmarks, including:

Fact-based question answering (NaturalQuestions, TriviaQA)

Multi-hop question answering (HotpotQA)

Scientific and commonsense knowledge (MMLU, HellaSwag, OBQA, PIQA)

Code writing (HumanEval, MBPP)

Results indicate that models equipped with memory layers outperform baseline models in all these tests, with the most notable performance improvements in factual question answering.

Meta's research not only provides new insights for scaling AI models but also opens new pathways for addressing factual issues and enhancing model performance. Researchers believe that memory layer technology has strong scalability and is expected to be widely applied in various AI applications in the future. They also pointed out that while memory layers still face challenges in hardware acceleration, they are confident that ongoing research and optimization can make their performance comparable to or even surpass traditional feedforward networks.

Furthermore, Meta's research team hopes to further enhance the performance of memory layers through new learning methods, reducing model forgetfulness and hallucinations, and achieving continuous learning.

This research release undoubtedly injects new vitality into the AI field and fills us with anticipation for the future development of AI.

Paper: https://arxiv.org/pdf/2412.09764

AI Daily Report - June 30th: Baidu Open Sources the WENXIN Large Model 4.5 Series; Tongyi Qianwen Multimodal Generation Model Qwen VLo

Welcome to the AIbase [AI Daily Report] section! Spend three minutes a day to learn about the latest AI events, helping you understand AI industry trends and innovative AI product applications. For more AI news, visit: https://www.aibase.com/zh1. Baidu officially releases the WENXIN Large Model 4.5 series and fully opens it to the public, featuring ten new models with various parameter configurations. These models are trained and inferred using the PaddlePaddle framework, achieving a FLOPs utilization rate of 47%, and perform well in multi-modal text tasks.

Test Article

The internal testing project of Xiaomi, "AI Toolkit," has officially announced the end of its phased testing and plans to suspend service starting July 5, 2025. As an important AI project incubated internally by Xiaomi, the AI Toolkit aims to explore and integrate cutting-edge AI technologies, providing users with a series of innovative features and experiences. Although the specific testing functions and application scenarios have not been fully disclosed, its name suggests its positioning as a multifunctional AI toolset. During the recent testing period, the AI Toolkit has gathered some Xiaomi employees

Test Article

The internal testing project of Xiaomi, "AI Toolbox," has officially announced the end of its phased internal testing and plans to suspend services starting from July 5, 2025. As an important AI project incubated internally by Xiaomi, the AI Toolbox aims to explore and integrate cutting-edge AI technologies, providing users with a series of innovative features and experiences. Although the specific internal testing functions and application scenarios have not been fully disclosed, its name suggests its positioning as a multifunctional AI toolkit. During the recent internal testing period, the AI Toolbox has gathered some Xiaomi employees

Baidu Launches the WENXIN Large Model 4.5 Series Open Source, Sparking a New Wave in the Domestic Large Model Market!

Recently, Baidu officially announced the open-source release of its WENXIN Large Model 4.5 series, launching a total of ten models, including mixed expert (MoE) models with 47B and 3B activated parameters, as well as dense models with 0.3B parameters. This open-source initiative not only fully publicizes the pre-trained weights but also provides inference code, marking a significant advancement for Baidu in the field of large models. These newly released models can be downloaded and deployed on platforms such as PaddlePaddle Starry Sky Community and Hugging Face. Additionally, Baidu Intelligent Cloud's Qianfan Large Model Platform also provides

The Internal Testing Period of Xiaomi AI Toolbox Ends, Service Will Be Suspended Starting July 5

The internal testing project "Xiaomi AI Toolbox" has officially announced the end of its phased internal testing and plans to suspend service starting July 5, 2025. "AI Toolbox" is an important AI project incubated internally by Xiaomi, aimed at exploring and integrating cutting-edge AI technologies to provide users with a series of innovative features and experiences. Although the specific internal testing functions and application scenarios have not been fully disclosed, its name suggests its positioning as a multifunctional AI toolset. During the recent internal testing period, "AI Toolbox" has gathered some Xiaomi employees and core users.

The 'In-Depth Research' Feature of Doubao is Now in Testing on the Doubao APP, Web Version, and Desktop Version

Recently, the Doubao APP, web version, and desktop version platforms have introduced a new feature test - the 'In-Depth Research' feature has been officially launched, offering users free trial. This feature aims to help users efficiently handle complex tasks by quickly integrating massive in-depth information and generating detailed research reports or visualized web results.

AI News

AI Daily

AI Timeline

Al Hardware

Latest Cases

Image Collection

Video Collection

Audio Collection

Content Collection

Latest Tutorials

AI Product Ranking

AI Traffic Growth Ranking

AI Traffic Decline Ranking

AI Weekly Ranking

United States

China

India

Brazil

Image Generation

Personal Assistant

Character Generation

Video Generation

AI Project Ranking

AI Project Growth Ranking

AI Developer Ranking

AI Organization Ranking

Deepseek

TTS

LLM

ChatGPT

Overview

Meta Releases New Memory Layer Technology: Breaking Parameter Limits and Significantly Improving AI Fact Accuracy

AIbase基地

This article is from AIbase Daily

AI News Recommendations

AI Daily Report - June 30th: Baidu Open Sources the WENXIN Large Model 4.5 Series; Tongyi Qianwen Multimodal Generation Model Qwen VLo

Meta 3.2 Billion Dollar Talent Acquisition from OpenAI! The AI Talent War Has Exploded, Will the Industry Landscape Change?

Test Article

Test Article

Zhihu Direct Answer Upgrades Knowledge Base Function, Deeply Integrates Community Content to Create an Immersive AI Q&A Experience

New Open Source AI System OmniGen 2: Integrates Image and Text Generation Like GPT-4o

Baidu Launches the WENXIN Large Model 4.5 Series Open Source, Sparking a New Wave in the Domestic Large Model Market!

The Internal Testing Period of Xiaomi AI Toolbox Ends, Service Will Be Suspended Starting July 5

The 'In-Depth Research' Feature of Doubao is Now in Testing on the Doubao APP, Web Version, and Desktop Version

AI Parenting Video: How to Earn Over 600 Per Day Using Trending Topics and AI Tools - Detailed Step-by-Step Breakdown