Mercury: A First-of-Its-Kind Commercially Available Diffusion LLM, Fast and Mobile Deployable

AIbase基地

Published inAI News · 4 min read · Feb 28, 2025

A groundbreaking technology is quietly emerging in the field of artificial intelligence. Recently, Inception Labs announced the launch of the Mercury series of diffusion large language models (dLLMs), a new generation of language models designed for fast, efficient, and high-quality text generation. Compared to traditional autoregressive large language models, Mercury boasts up to a 10x speed improvement, achieving over 1000 tokens per second on an NVIDIA H100 GPU – a speed previously only achievable with custom chips.

The first product in the Mercury series, Mercury Coder, has already debuted in public testing. This model focuses on code generation and demonstrates exceptional performance, surpassing many existing speed-optimized models like GPT-4o Mini and Claude3.5Haiku in multiple programming benchmarks, while also being nearly 10 times faster. Developer feedback indicates that Mercury's code completion is highly favored; in Copilot Arena testing, Mercury Coder Mini ranked among the top performers and was one of the fastest models.

Current language models mostly employ an autoregressive approach, generating tokens sequentially from left to right. This inherently sequential process leads to higher latency and computational costs. Mercury, however, utilizes a "coarse-to-fine" generation method, starting from pure noise and iteratively refining the output through several "denoising" steps. This allows the Mercury model to perform parallel processing of multiple tokens during generation, resulting in improved reasoning and structured response capabilities.

With the launch of the Mercury series, Inception Labs showcases the immense potential of diffusion models in text and code generation. The company plans to introduce language models for chat applications next, further expanding the application scenarios of diffusion language models. These new models will feature enhanced intelligent agent capabilities, enabling complex planning and long-form generation. Their efficiency also allows them to run smoothly on resource-constrained devices such as smartphones and laptops.

Overall, the introduction of Mercury marks a significant advancement in AI technology, offering substantial improvements in speed and efficiency, while also providing higher-quality solutions for the industry.

Official introduction: https://www.inceptionlabs.ai/news

Online experience: https://chat.inceptionlabs.ai/

Key Highlights:
🌟 Launch of the Mercury series of diffusion large language models (dLLMs), achieving generation speeds of over 1000 tokens per second.
🚀 Mercury Coder excels in code generation, outperforming numerous existing models in benchmark tests.
💡 The innovative approach of diffusion models makes text generation more efficient and accurate, opening new possibilities for intelligent agent applications.

Elon Musk's xAI Launches the World's Most Powerful AI Training System Colossus, Supported by 100,000 Nvidia H100 GPUs

Elon Musk's xAI has announced the launch of an artificial intelligence training system named 'Colossus', powered by 100,000 Nvidia H100 GPUs, which is considered the most powerful AI training system globally. Its performance may even surpass that of the U.S. Department of Energy's Aurora supercomputer. The power of Colossus is largely attributed to Nvidia's H100 GPU, which operates language models at a speed 30 times faster than previous GPUs, especially in transformer-based tasks.

Tesla's Cortex AI Supercomputer Cluster Shows Early Potential, Equipped with 50,000 Nvidia H100 GPUs

Elon Musk showcased the internal structure of Tesla's Cortex AI supercomputer cluster being built at the Austin headquarters, which is expected to require 130 megawatts of cooling and power upon launch, growing to 500 megawatts by 2026. The cluster contains 70,000 AI servers, primarily used to solve real-world AI issues, particularly for training Tesla's full self-driving system and the AI for the Optimus robot. Cortex will rely entirely on Nvidia's hardware at launch, while Tesla's own hardware is expected.

Cloud Computing Company Lambda Launches New Cluster Service for On-Demand Nvidia H100 GPU Access

GPU cloud computing company Lambda has launched a 1-Click Cluster Service, allowing users to access Nvidia H100 GPUs and Quantum 2 InfiniBand clusters on demand, especially suitable for companies needing GPU compute power for short-term projects. This innovative service simplifies the coordination of hardware and software, with no long-term contracts required; users simply reserve when needed, with a minimum reservation time of two weeks. Founded in 2012, Lambda recently raised $320 million, valuing the company at $1.5 billion, indicating strong market demand for its services.

Grok2's Imminent Launch: xAI Fuels AI Race with 100,000 GPU Supercomputer Delivery by Month's End

On July 9th, Musk announced that his artificial intelligence company xAI is building a supercomputer with 100,000 NVIDIA H100 GPUs, expected to be delivered and begin training by the end of this month. This move marks the termination of xAI's negotiations with Oracle to expand the existing agreement and rent more NVIDIA chips. Musk emphasized that this will become "the most powerful training cluster globally with a significant lead." He said that the core competitive edge of xAI lies in speed, "

Musk: Training Grok-3 with 100,000 NVIDIA H100 GPUs: A Remarkable Feat

Elon Musk recently announced that his artificial intelligence startup, xAI, is set to launch its revolutionary large language model, Grok-2, in August, signaling the arrival of more advanced AI capabilities. Although Grok-2 has not yet been fully unveiled, Musk is already excitedly hyping up the upcoming masterpiece, Grok-3. When discussing AI development, Musk particularly emphasized the importance of datasets and the arduous task of cleaning data for large language models (LMMs). He also did n

OpenAI's Sora Takes Approximately 12 Minutes to Generate 1 Minute of Video on NVIDIA H100

OpenAI's Sora generates 5 minutes of video per hour on the NVIDIA H100, allowing for 120 minutes of video creation daily to support the TikTok and YouTube creator community, requiring about 720,000 NVIDIA H100 GPUs. Sora will target visual artists and filmmakers after its launch, with plans to incorporate sound and editing flexibility. OpenAI is promoting Sora in Hollywood, allowing creators to generate multiple candidates before uploading videos.

AI Chip Supply Eases, NVIDIA H100 GPU Supply Issues Begin to Resolve

The delivery time for NVIDIA H100 GPU has been shortened to 3-4 months, and some companies have begun reselling excess H100 80GB processors. Demand for AI chips still exceeds supply, and cloud service providers offer on-demand rental services for H100 GPUs. Prices for NVIDIA H100 and similar processors have not decreased, and companies are placing more emphasis on pricing and procurement rationality. The AI sector may witness a more balanced market landscape.

AI News

AI Daily

AI Timeline

Latest Cases

Image Collection

Video Collection

Audio Collection

Content Collection

Latest Tutorials

AI Product Ranking

AI Traffic Growth Ranking

AI Traffic Decline Ranking

AI Weekly Ranking

United States

China

India

Brazil

Image Generation

Personal Assistant

Character Generation

Video Generation

AI Project Ranking

AI Project Growth Ranking

AI Developer Ranking

AI Organization Ranking

Deepseek

TTS

LLM

ChatGPT

Overview

Mercury: A First-of-Its-Kind Commercially Available Diffusion LLM, Fast and Mobile Deployable

AIbase基地

This article is from AIbase Daily

AI News Recommendations

Portkey AI Gateway: An Open-Source AI Solution for Easy Integration of Multiple Large Language Models

BioChatter: An Open-Source Framework for BioMedical Research, Lowering the Barrier to LLM Use