Meta Proposes a Novel Scalable Memory Layer to Enhance Language Model Knowledge and Reduce Hallucination Phenomena

AIbase基地

Published inAI News · 5 min read · Jan 8, 2025

196

As businesses increasingly adopt large language models (LLMs), enhancing the accuracy of the models' knowledge and reducing hallucination phenomena have become significant challenges. Researchers at Meta AI have proposed a "Scalable Memory Layer" in a new paper that may offer a solution to this issue.

Meta, Metaverse, Facebook

The core idea of the Scalable Memory Layer is to add more parameters to LLMs without increasing computational resources during inference, thereby enhancing their learning capabilities. This architecture is suitable for applications that require storing large amounts of factual knowledge while also wanting to maintain inference speed.

Traditional language models use "dense layers" to encode a significant amount of information. In dense layers, all parameters are almost simultaneously activated during inference, enabling the learning of complex functions, but this requires additional computational and energy resources. For simple factual knowledge, using simple layers with associative memory architecture is more efficient and easier to understand, which is the role of the memory layer. The memory layer encodes and retrieves knowledge through simple sparse activation and key-value lookup mechanisms. Although sparse layers occupy more memory than dense layers, they use only a small number of parameters at the same time, thus improving computational efficiency.

While memory layers have existed for many years, they are rarely applied in modern deep learning architectures, mainly because they have not been optimized for current hardware accelerators. Leading-edge LLMs typically employ some form of "mixture of experts" architecture, which shares similarities with memory layers. Mixture of experts models consist of multiple specialized small expert components that activate specific experts during inference through a routing mechanism.

To overcome the challenge of memory layers being computationally lightweight but memory-intensive, Meta's researchers have proposed several improvements to make them feasible for large-scale applications. They configured the memory layers for parallelization, allowing millions of key-value pairs to be stored across multiple GPUs without slowing down the model's operation. Additionally, they developed specific CUDA kernels for handling high memory bandwidth operations and implemented a parameter sharing mechanism that allows multiple memory layers to share a set of memory parameters.

By modifying the Llama model to replace one or more dense layers with shared memory layers, the researchers tested the memory-enhanced models. Their study found that the memory model performed exceptionally well across multiple tasks, particularly in tasks requiring factual knowledge, outperforming the dense baseline significantly and even competing with models that used 2 to 4 times the computational resources.

Paper link: https://arxiv.org/abs/2412.09764

Key Points:
🧠 The Scalable Memory Layer can enhance the learning capabilities of language models without increasing computational resources.
💡 The research found that memory layers excelled across multiple tasks, especially in scenarios requiring factual knowledge.
🚀 Meta's researchers urge the integration of memory layers into next-generation AI architectures to reduce forgetfulness and hallucination phenomena.

A Daily: Moonlight Open-Sources Large Model Kimi K2; Zhiyuan Fully Open-Sources RoboBrain 2.0; Tongyi Qianwen Launches Qwen Chat Desktop Client

Moon's dark side opens trillion-parameter Kimi K2 model; RoboBrain2.0 enhances robot cognition; Alibaba's Qwen adds image generation; IndexTTS2 revolutionizes voice cloning; HuggingFace's Reachy Mini sells well; Meta enables real-time video generation; PixVerse adds multi-keyframe; Tesla Grok supports AMD only; OpenAI delays open-source release; Liquid AI's LFM2 boosts edge AI; AI 'time travel' trend goes viral.....

OpenAI Delays First Open-Source Large Model Release, Ensuring Safety Becomes Top Priority

OpenAI announced the postponement of its first open-source large model release, with CEO Sam Altman stating that more time is needed for safety testing and risk assessment. This new model, which has performance comparable to o3-mini, may be named 'Open Model,' but the extent of its openness remains unclear. Research Vice President Aidan Clark emphasized that the company maintains strict standards for open source, as the model cannot be recalled once released. Although the delay disappointed some users, OpenAI believes ensuring safety and taking a responsible approach is more important. This decision will shape the future of models.

OpenAI Postpones Open-Source Large Model Release, Prioritizes Safety Testing

OpenAI announced the postponement of the open-source large model release. CEO Sam Altman stated that more time is needed for safety testing. The model was originally scheduled to be released this week but is now delayed until next week to ensure its safety and reliability. Altman emphasized that once the model is released, it cannot be recalled and must be handled with caution. This is OpenAI's first attempt to release a downloadable self-running model, aimed at providing powerful tools for researchers and small businesses. Although the delay is disappointing, the community generally understands the importance of safety testing and believes it is crucial for the AI ecosystem.

Mistral AI Releases Devstral2507: Designed for Code-Centric Language Modeling

Mistral AI launched the Devstral2507 series with two AI models: the open-source Devstral Small1.1 (24 billion parameters, SWE-Bench score of 53.6%) and the enterprise version Devstral Medium2507 (score of 61.6%). Small1.1 supports a 128k context window and local deployment, while Medium2507 outperforms some commercial models. Both are optimized for code reasoning and program synthesis, and support integration with agent frameworks.

AI Daily: xAI Shockingly Launches Grok4; Microsoft Opensources New Phi-4-mini Version; Shanghai has Cumulatively 82 Large Models Passed Filing

1. xAI launches Grok4 with enhanced math/coding capabilities; 2. Microsoft open-sources efficient Phi-4-mini for edge devices; 3. Shanghai approves 82 specialized AI models; 4. Hugging Face releases Reachy Mini robot; 5. Perplexity debuts Comet AI browser; 6. OpenAI plans first open-weight model; 7. Google releases GPU-friendly MedGemma; 8. OpenAI acquires AI hardware firm for $6.5B.....

Shanghai has completed the filing of 82 large models

At the 2025 World Artificial Intelligence Conference, it was revealed that Shanghai has filed 82 large models and is actively promoting AI demonstration applications in key industries such as manufacturing and finance. Xuhui Moshu Space and Pudong Moli Community have become industrial carriers, gathering 500 and 200 AI companies respectively. Shanghai has established a full-cycle financing support system from the early stages to the mature stage through national and municipal artificial intelligence funds, with a focus on key areas such as computing power and language data.

Product Finder

Product Submit

AI Models Finder

MCP Servers

MCP Client

MCP Inspector

Case Tutorials

Latest AI News

AI Daily Brief

Meta Proposes a Novel Scalable Memory Layer to Enhance Language Model Knowledge and Reduce Hallucination Phenomena

AIbase基地

This article is from AIbase Daily

AI News Recommendations

MiniMax Valued Over 4 Billion USD, Backed by Shanghai State Capital, Joins the 3 Billion USD Large Model Club

A Daily: Moonlight Open-Sources Large Model Kimi K2; Zhiyuan Fully Open-Sources RoboBrain 2.0; Tongyi Qianwen Launches Qwen Chat Desktop Client

OpenAI Delays First Open-Source Large Model Release, Ensuring Safety Becomes Top Priority

OpenAI Postpones Open-Source Large Model Release, Prioritizes Safety Testing

Major Release! Moonshot Introduces Open-Source Large Model Kimi K2 with Trillion Parameters

Mistral AI Releases Devstral2507: Designed for Code-Centric Language Modeling

City Commercial Banks Are Launching a Trend of Large Model Bidding, with Million-Level Investments Becoming a New Industry Opportunity!

Personification of Large AI Models: Grok 4 and Empathy with Musk?

AI Daily: xAI Shockingly Launches Grok4; Microsoft Opensources New Phi-4-mini Version; Shanghai has Cumulatively 82 Large Models Passed Filing

Shanghai has completed the filing of 82 large models