A groundbreaking technology is quietly emerging in the field of artificial intelligence. Recently, Inception Labs announced the launch of the Mercury series of diffusion large language models (dLLMs), a new generation of language models designed for fast, efficient, and high-quality text generation. Compared to traditional autoregressive large language models, Mercury boasts up to a 10x speed improvement, achieving over 1000 tokens per second on an NVIDIA H100 GPU – a speed previously only achievable with custom chips.
The first product in the Mercury series, Mercury Coder, has already debuted in public testing. This model focuses on code generation and demonstrates exceptional performance, surpassing many existing speed-optimized models like GPT-4o Mini and Claude3.5Haiku in multiple programming benchmarks, while also being nearly 10 times faster. Developer feedback indicates that Mercury's code completion is highly favored; in Copilot Arena testing, Mercury Coder Mini ranked among the top performers and was one of the fastest models.
Current language models mostly employ an autoregressive approach, generating tokens sequentially from left to right. This inherently sequential process leads to higher latency and computational costs. Mercury, however, utilizes a "coarse-to-fine" generation method, starting from pure noise and iteratively refining the output through several "denoising" steps. This allows the Mercury model to perform parallel processing of multiple tokens during generation, resulting in improved reasoning and structured response capabilities.
With the launch of the Mercury series, Inception Labs showcases the immense potential of diffusion models in text and code generation. The company plans to introduce language models for chat applications next, further expanding the application scenarios of diffusion language models. These new models will feature enhanced intelligent agent capabilities, enabling complex planning and long-form generation. Their efficiency also allows them to run smoothly on resource-constrained devices such as smartphones and laptops.
Overall, the introduction of Mercury marks a significant advancement in AI technology, offering substantial improvements in speed and efficiency, while also providing higher-quality solutions for the industry.
Official introduction: https://www.inceptionlabs.ai/news
Online experience: https://chat.inceptionlabs.ai/
Key Highlights:
🌟 Launch of the Mercury series of diffusion large language models (dLLMs), achieving generation speeds of over 1000 tokens per second.
🚀 Mercury Coder excels in code generation, outperforming numerous existing models in benchmark tests.
💡 The innovative approach of diffusion models makes text generation more efficient and accurate, opening new possibilities for intelligent agent applications.