Huazhong University of Science and Technology and ByteDance Launch Liquid: Redefining Multimodal Model Generation and Understanding

AIbase基地

Published inAI News · 3 min read · Mar 4, 2025

Large Language Models (LLMs) have made significant strides in artificial intelligence, particularly in multimodal fusion. A collaborative team from Huazhong University of Science and Technology, ByteDance, and the University of Hong Kong recently introduced a novel multimodal generation framework—Liquid—designed to address the limitations of current mainstream multimodal models in visual processing.

Traditional multimodal LLMs rely on complex external visual modules, increasing system complexity and limiting scalability. Liquid's innovation lies in its use of VQGAN as an image tokenizer, eliminating the need for external visual components. By encoding images into discrete visual tokens, the model can directly share the vocabulary with text tokens, enabling "native" visual understanding and generation capabilities.

Research reveals that Liquid not only reduces training costs but also unveils the scaling laws of multimodal capabilities with LLMs. Experiments were conducted on LLMs of varying sizes (0.5B to 32B parameters), demonstrating that as model size increases, performance and generation quality in visual generation tasks follow scaling laws consistent with language tasks. Even more exciting is the bidirectional promotion between visual understanding and generation tasks, where both can be jointly optimized through a shared representation space.

Liquid's design embodies minimalism, treating images and text equally with a unified processing framework. During construction, the research team utilized 30M text data and 30M image-text pairs, laying the foundation for the model's multimodal training. Final experimental results show that Liquid exhibits superior performance in multimodal understanding, image generation, and pure text tasks, with significantly higher semantic consistency between generated images and text compared to other autoregressive models.

Liquid offers a new architectural design for general-purpose multimodal intelligence, suggesting a potentially more efficient and flexible evolution of artificial intelligence in multimodal fusion.

Paper link: https://arxiv.org/pdf/2412.04332

Large Language Model Multi-modal Generation Liquid framework VQGAN

This article is from AIbase Daily

Welcome to the [AI Daily] column! This is your daily guide to exploring the world of artificial intelligence. Every day, we present you with hot topics in the AI field, focusing on developers, helping you understand technical trends, and learning about innovative AI product applications.

—— Created by the AIbase Daily Team

AI News Recommendations

DeepSeek's Innovative SPCT Technology Enables LLMs to Better Understand Human Intent

DeepSeek AI, a prominent Chinese artificial intelligence research lab, following its powerful open-source language model DeepSeek-R1, has achieved another significant breakthrough in the field of Large Language Models (LLMs). Recently, DeepSeek AI officially launched an innovative technology called Self-Principled Critique Tuning (SPCT), aimed at building more general-purpose and scalable AI reward models.

Apr 9, 2025

220

NVIDIA Unveils Llama 3.1 Nemotron Ultra 253B: Redefining AI Performance Standards

NVIDIA, a global leader in chip and AI technology, recently launched a groundbreaking new open-source large language model, Llama 3.1 Nemotron Ultra 253B, generating significant excitement within the AI community. Built upon Meta's Llama-3.1-405B, this model boasts innovative optimizations that surpass competitors like Llama 4 Behemoth and Maverick in performance, while demonstrating superior resource efficiency and exceptional multi-tasking capabilities.

Apr 9, 2025

150

NVIDIA Unveils Llama 3.1 Nemotron Ultra 253B: A New Benchmark in Performance

On April 8th, 2025, NVIDIA launched Llama 3.1 Nemotron Ultra 253B, an open-source model optimized from Llama-3.1-405B. With 25.3 billion parameters, it surpasses Meta's Llama 4 Behemoth and Maverick, becoming a focal point in the AI field. This model demonstrates superior performance in benchmarks such as GPQA-Diamond, AIME 2024/25, and LiveCodeBench, achieving inference throughput comparable to DeepSeek.

Apr 9, 2025

210

Open-Source DeepCoder Model: Highly Efficient Programming, Surpassing OpenAI's o1 Model

In the rapidly evolving landscape of technology, Artificial Intelligence (AI) continues to advance at an unprecedented pace. Recently, the newly open-sourced DeepCoder-14B-Preview model, a collaboration between renowned large model training platform Together AI and agent platform Agentica, has garnered significant attention. With only 14 billion parameters, this model achieved a score of 60.6% on the LiveCodeBench code testing platform, surpassing OpenAI's o1 model (59.5%) by a small margin.

Apr 9, 2025

230

Kugou Music and DeepSeek Partner to Launch a New AI-Powered Music Report

In the context of the increasing integration of AI technology into the entertainment industry, Kugou Music and DeepSeek, a leading domestic AI company, have established a strategic partnership. This collaboration leverages large language models to revolutionize music platforms, transforming them from mere "tool-based applications" into "intelligent entertainment hubs." This transformation is centered around four core AI functional modules that are comprehensively reshaping the entire music consumption experience, setting a new benchmark for AI and music integration.

Apr 8, 2025

210

Qwen3 is Coming Soon: Alibaba Cloud's New Model Integrates with vLLM, High Performance Anticipated

Recently, Alibaba Cloud's Qwen series of AI large language models has seen significant progress. Support for its next-generation model, Qwen3, has been officially merged into the vLLM (efficient large language model inference framework) codebase. This news has sparked heated discussions in the tech community, signaling that Qwen3's release is imminent. It is understood that Qwen3 will include at least two versions: Qwen3-8B and Qwen3-MoE-15B-A2B, representing innovative attempts at different scales and architectures, for developers and enterprises.

Apr 8, 2025

910

Mozilla Releases LocalScore: A New Tool to Simplify Benchmarking Local AI Models

Mozilla recently launched a tool called LocalScore through its Mozilla Builders program, aimed at providing easy benchmarking for local Large Language Models (LLMs). Compatible with Windows and Linux systems, the tool shows great potential as a key component of easily distributable LLM frameworks. While still in early development, LocalScore already demonstrates promising performance.

Apr 8, 2025

140

Microsoft Launches Free AI Skills Training to Boost Career Competitiveness

Amidst the rapid advancement of Artificial Intelligence (AI), Microsoft is actively promoting AI literacy with its 50-day AI Skills Festival. This event is open to everyone, from beginners to professionals, offering free registration and access to a wealth of AI learning resources. The initiative aims not only to enhance public AI capabilities but also to break a Guinness World Record, making it a fun and practical event. AI is transforming the way various industries operate, particularly in daily office work. Microsoft hopes to...

Apr 7, 2025

270

Meta Releases Llama 4 Large Language Model: Mixed-Expert Architecture Ushers in a New Era for AI

Meta has released its latest open-source AI model, Llama 4, marking another significant advancement in the field of artificial intelligence. Llama 4 comes in two versions, Scout and Maverick, designed to enhance AI model capabilities and performance. Meta states that Llama 4 is a multimodal large language model capable of processing various data types, including text, images, video, and audio, and can freely convert between these formats. Notably, the Llama 4 series is the first...

Apr 7, 2025

150

NVIDIA AI Researchers Introduce FFN Fusion Technology: Accelerating Large Language Model Inference

Mar 31, 2025

480

AI News

AI Daily

AI Timeline

Latest Cases

Image Collection

Video Collection

Audio Collection

Content Collection

Latest Tutorials

AI Product Ranking

AI Traffic Growth Ranking

AI Traffic Decline Ranking

AI Weekly Ranking

United States

China

India

Brazil

Image Generation

Personal Assistant

Character Generation

Video Generation

AI Project Ranking

AI Project Growth Ranking

AI Developer Ranking

AI Organization Ranking

Deepseek

TTS

LLM

ChatGPT

Overview

Huazhong University of Science and Technology and ByteDance Launch Liquid: Redefining Multimodal Model Generation and Understanding

AIbase基地

This article is from AIbase Daily

AI News Recommendations

DeepSeek's Innovative SPCT Technology Enables LLMs to Better Understand Human Intent

NVIDIA Unveils Llama 3.1 Nemotron Ultra 253B: Redefining AI Performance Standards

NVIDIA Unveils Llama 3.1 Nemotron Ultra 253B: A New Benchmark in Performance

Open-Source DeepCoder Model: Highly Efficient Programming, Surpassing OpenAI's o1 Model

Kugou Music and DeepSeek Partner to Launch a New AI-Powered Music Report

Qwen3 is Coming Soon: Alibaba Cloud's New Model Integrates with vLLM, High Performance Anticipated

Mozilla Releases LocalScore: A New Tool to Simplify Benchmarking Local AI Models

Microsoft Launches Free AI Skills Training to Boost Career Competitiveness

Meta Releases Llama 4 Large Language Model: Mixed-Expert Architecture Ushers in a New Era for AI

NVIDIA AI Researchers Introduce FFN Fusion Technology: Accelerating Large Language Model Inference