Have you ever felt that image models, painstakingly trained on massive datasets, generate high-quality images at a snail's pace? Don't worry, Luma AI recently open-sourced a pre-training technique for image models called Inductive Moment Matching (IMM), which is said to enable models to generate high-quality images at an unprecedented speed – a real turbocharger for your AI alchemy!

Algorithm Stagnation? Luma AI Shatters the "Ceiling"

In recent years, the AI community has generally felt that generative pre-training has hit a bottleneck. Despite the continuous increase in data volume, algorithmic innovation has been relatively stagnant. Luma AI believes this isn't due to insufficient data, but rather to algorithms failing to fully exploit the data's potential – like having a gold mine but only using a shovel to dig, incredibly inefficient.

To break through this "algorithmic ceiling," Luma AI focused on efficient inference-time computational scaling. They believe that instead of "involution" in model capacity, it's better to consider how to speed up the inference stage. Thus, IMM, this "speedster," was born!

QQ_1741763781543.png

IMM: Making Inference "Leap and Bound"

So, what makes IMM so unique that it can achieve such dramatic speed improvements?

The key lies in its reverse design of the pre-training algorithm from the perspective of inference efficiency. Traditional diffusion models are like meticulous artists, requiring step-by-step refinement. Even with powerful models, numerous steps are needed for optimal results. IMM is different; it's like an artist with "teleportation" skills. During inference, the network considers not only the current timestep but also the "target timestep."

Imagine traditional diffusion models generating images like navigating a maze step by step. IMM, however, sees the maze's exit directly, allowing it to "leap" more flexibly, significantly reducing the required steps. This clever design makes each iteration more expressive, no longer constrained by linear interpolation.

Even more commendable is IMM's introduction of maximum mean discrepancy (MMD), a mature moment matching technique. This is like adding a precise navigation system to the "leap," ensuring the model accurately progresses towards high-quality targets.

Tenfold Speed Increase, Even Higher Quality!

The proof is in the pudding. Luma AI has demonstrated IMM's power through a series of experiments:

  • On the ImageNet256x256 dataset, IMM achieved a FID score of 1.99 using 30 times fewer sampling steps, surpassing diffusion models (2.27 FID) and Flow Matching (2.15 FID). It's like completing the task in a flash, with even better quality!
  • On the standard CIFAR-10 dataset, IMM achieved a FID score of 1.98 using only 2 sampling steps, reaching the best level for this dataset. Two steps! You heard that right, in the blink of an eye!

Besides speed, IMM also excels in training stability. In contrast, Consistency Models are prone to instability during pre-training and require special hyperparameter designs. IMM is more "worry-free," training stably under various hyperparameters and model architectures.

It's worth noting that IMM does not rely on denoising score matching or score-based stochastic differential equations, which are crucial for diffusion models. Luma AI believes that the true breakthrough lies not just in moment matching itself, but in their inference-first perspective. This approach allowed them to identify limitations in existing pre-training paradigms and design innovative algorithms that overcome these limitations.

Luma AI is confident about IMM's future, believing this is just the beginning, signifying a new paradigm towards multi-modal foundational models that transcend existing boundaries. They hope to fully unleash the potential of creative intelligence.

GitHub repository: https://github.com/lumalabs/imm