Breakthrough in AI Training! The New CoMERA Framework Significantly Reduces Model Training Costs and Resource Consumption

AIbase基地

Published inAI News · 7 min read · Dec 26, 2024

169

Training large AI models (such as Transformers and language models) has become an indispensable key aspect of the AI field, but it also faces high computational costs, memory consumption, and energy demands. For example, OpenAI's GPT-3 has 175 billion parameters and requires weeks of GPU training. This enormous resource requirement limits the application of this technology in organizations with ample computational resources, while also intensifying concerns about energy efficiency and environmental impact. Addressing these challenges is crucial for ensuring broader accessibility and sustainability in AI development.

Traditional training methods are inefficient and require innovative solutions.

The primary reason for the inefficiency in training large models is their reliance on dense matrices, which require substantial memory and computational power. Modern GPUs have limited support for optimized low-precision or low-rank operations, further exacerbating these demands. Although some methods have been proposed, such as matrix decomposition and heuristic rank reduction, they still face limitations in practical applications. For instance, GaLore can train in a single-batch setup but has impractical runtime overhead. Similarly, using low-rank adapters in LTE poses convergence issues for large tasks. Currently, there is a lack of a method that can simultaneously reduce memory usage, computational costs, and training time without compromising performance, making the need for innovative solutions urgent.

CoMERA Framework: Achieving Efficient Training Through Adaptive Tensor Optimization

Researchers from the University at Albany (State University of New York), University of California, Santa Barbara, Amazon Alexa AI, and Meta have jointly introduced a new framework called CoMERA (Computing-and Memory-Efficient training method via Rank-Adaptive tensor optimization). This framework combines memory efficiency and computational speed through adaptive rank tensor compression technology. Unlike traditional methods that focus solely on compression, CoMERA employs a multi-objective optimization approach to balance compression ratio and model accuracy. It optimizes GPU utilization using tensorized embeddings and advanced tensor network contraction to reduce runtime overhead while maintaining robust performance. The framework also introduces CUDA graphs to minimize kernel launch latency during GPU operations, a significant bottleneck in traditional tensor compression methods.

The foundation of CoMERA is adaptive tensor representation, which allows model layers to dynamically adjust their rank based on resource constraints. By modifying tensor ranks, the framework achieves compression without compromising the integrity of neural network operations. This dynamic optimization is realized through a two-phase training process:

Early Phase: Focused on stable convergence.

Later Phase: Fine-tuning ranks to meet specific compression targets.

In a six-encoder Transformer model, CoMERA achieved a compression ratio of up to 43 times in its early phase, and in its later optimization phase, the compression ratio reached as high as 361 times. Additionally, compared to GaLore, it reduced memory consumption by 9 times and improved training speed by 2-3 times per epoch.

Multiple test results demonstrate CoMERA's outstanding performance.

When applied to train Transformer models on the MNLI dataset, CoMERA reduced the model size from 256MB to as low as 3.2MB while maintaining accuracy. In large-scale recommendation systems like DLRM, CoMERA compressed the model by 99 times and reduced peak memory usage by 7 times. The framework also excelled in pre-training CodeBERT (a large language model for specific domains), achieving an overall compression ratio of 4.23 times and doubling the speed in certain training phases. These results highlight its capability to handle various tasks and architectures, expanding its applicability across different fields.

Key Advantages of the CoMERA Framework Summarized

The main conclusions of this research are as follows:

CoMERA achieved a compression ratio of up to 361 times for specific layers and 99 times for the entire model, significantly reducing storage and memory requirements.

The framework reduced the training time per epoch for Transformers and recommendation systems by 2-3 times, saving computational resources and time.

By using tensorized representations and CUDA graphs, CoMERA reduced peak memory consumption by 7 times, making it feasible to train on smaller GPUs.

CoMERA's approach supports various architectures, including Transformers and large language models, while maintaining or improving accuracy.

By lowering the energy and resources required for training, CoMERA contributes to more sustainable AI practices and enables a broader audience to access cutting-edge models.

OpenAI Releases gpt-image-1 API: 4o Image Generation Capabilities Now Open

OpenAI has officially launched the gpt-image-1 API, marking the opening of its highly anticipated 4o image generation capabilities to developers. According to AIbase, this API is lauded by the community as the world's strongest 'image generation' tool due to its high-fidelity image generation, diverse visual styles, and powerful integration of world knowledge. The release announcement has generated significant excitement among AI developers and the creative community, with relevant documentation now publicly available via the OpenAI website and Playground platform. Core features: High-fidelity and diverse style generation

OpenAI Predicts $125 Billion Revenue by 2029, 3 Billion Monthly Active Users by 2030

OpenAI recently released a prediction forecasting $125 billion in total revenue by 2029. AI agent and channel revenue will be key drivers. AI agent revenue is projected to reach nearly $29 billion, representing almost a quarter of total revenue, while channel revenue is expected to reach $25 billion. Image note: Image generated by AI, image licensing service Midjourney. Following the success of ChatGPT, OpenAI's...

OpenAI Launches New ChatGPT Image Generation API: Developers Can Easily Integrate AI Image Creation Functionality

OpenAI recently announced that it has made its latest image generation capabilities available to developers via API, allowing them to integrate this advanced technology into various applications and services. This news offers developers a significant opportunity, particularly in the fields of image processing and creation. The newly launched image generation model, named "gpt-image-1," leverages the image generation technology behind ChatGPT. Since its launch at the end of March this year, users have been able to create realistic Ghibli-style images and various other visuals.

OpenAI's New GPT-4.1 Model Faces Challenges in Alignment

OpenAI recently released its latest AI model, GPT-4.1, claiming superior instruction following. However, independent tests suggest a decline in alignment, i.e., reliability, compared to its predecessor, GPT-4. OpenAI typically releases detailed technical reports including safety evaluations with new models, but hasn't done so this time, explaining that GPT-4.1 is not considered a 'cutting-edge' model.

OpenAI: We're Interested in Acquiring Chrome If Google Is Forced to Sell!

In a recent antitrust trial against Google in Washington, OpenAI executive Nick Turley revealed that OpenAI would be interested in acquiring Chrome should the court rule that Google must divest itself of the browser to restore competition in the search market. This statement highlights OpenAI's focus on search functionality and its ambitious future plans. Turley emphasized the importance of search...