Information

Latest AI News

Explore AI Frontiers, Master Industry Trends

AI Daily Brief

Your Daily AI Brief - Never Miss What's Next

Information

AI Product Finder

Smart Product Discovery - Comprehensive Market Intelligence

AI Product Rankings

AI Product Power Rankings - Performance, Buzz & Trends

AI Product Submit

Submit Your AI Product - Amplify Reach & Drive Growth

Tools

AI Tools Directory

Discover The Best AI Websites & Tools

Information

AI Models Finder

Comprehensive AI Models Collection for All Your Development & Research Needs

LLM Leaderboard

AI LLM Power Rankings - Performance, Buzz & Trends

Model Providers

Discover Trusted AI Model Partners - Guaranteed Reliable Support

Submit Your Model

Submit Your Model Info & Services - Precision Marketing & User Targeting

Tools

Compare LLMs

Multi-Dimensional Large Model Comparison - Find Your Perfect Match

LLM Cost Calculator

Calculate AI Model Costs Accurately - Optimize Your Budget

LLM Arena

Multi-Model Real-Time Evaluation & Quick Output Comparison

Information

MCP Servers

Discover Popular AI-MCP Services - Find Your Perfect Match Instantly

MCP Client

Easy MCP Client Integration - Access Powerful AI Capabilities

MCP Case Tutorials

Master MCP Usage - From Beginner to Expert

MCP Ranking

Top MCP Service Performance Rankings - Find Your Best Choice

MCP Service Submission

Publish & Promote Your MCP Services

Tools

MCP Playground

Test MCP Services Freely - Quick Online Experience

MCP Inspector

Quick MCP Service Testing - Fast Deployment

AI Brand Monitoring Tool

Analyze & Track How AI Models Cite Your Brand

GEO Services

Achieve Dominant Visibility in AI Search for Your Business or Brand with GEO Services

AI Search Visibility Checker

Detect brand's visibility on AI platforms

Tools

AI Model Compatibility Checker

Free PC Hardware Test for DeepSeek & Llama

AI Deployment Calculator

Enter Your Large Model Computing Requirements for Instant GPU, Memory & Server Configuration Recommendations

AI Tutorial

Information

AI Dataset Collection

Large-scale datasets and benchmarks for training, evaluating, and testing models to measure

Tools

Intelligent Document Recognition

Comprehensive Text Extraction and Document Processing Solutions for Users

Alibaba Launches Multi-Modal Large Model mPLUG-Owl3: Watch a 2-Hour Movie in 4 Seconds

AIbase基地

Published inAI News · 5 min read · Aug 19, 2024

479

In this era of information explosion, we document and share our lives with photos and videos every day. But have you ever wondered, what if there was a technology that enabled machines to understand these images and videos as humans do, and even engage in profound conversations with us?

The latest general-purpose multimodal large model, mPLUG-Owl3, released by the Alibaba team, showcases astonishing efficiency and comprehension capabilities, allowing us to "watch" a 2-hour movie in just 4 seconds! This isn't just a model; it's more like an AI assistant that can see, hear, speak, and think.

mPLUG-Owl3, sounding like a wise and alert owl wearing glasses, excels in understanding long sequences of images. Whether it's a series of photos or a video, it can grasp the content and even the storyline within.

To handle such vast amounts of information, researchers have equipped mPLUG-Owl3 with a super brain—a hyper-attention module. This module acts as the AI's super brain, capable of simultaneously processing visual and linguistic information, enabling the AI to understand images and related textual information concurrently.

The mPLUG-Owl3 model has made significant breakthroughs in multimodal understanding with its outstanding inference efficiency. It not only reaches the state of the art (SOTA) in various benchmarks across single image, multiple images, and video scenarios but also reduces First Token Latency by six times, and can process eight times more images, up to 400, with a single A100 GPU.

mPLUG-Owl3 accurately understands incoming multimodal knowledge and uses it to answer questions. It can even tell you which piece of knowledge it used for its judgment and the detailed reasoning behind it.

mPLUG-Owl3 can correctly understand the content relationships in different materials and conduct in-depth reasoning. Whether it's stylistic differences or character recognition, it handles them with ease.

mPLUG-Owl3 can watch and understand a 2-hour video, starting to answer user questions in just 4 seconds, regardless of which part of the video the questions pertain to.

mPLUG-Owl3 employs a lightweight Hyper Attention module, extending the Transformer Block into a new module capable of interactive image-text feature processing and text modeling. This design significantly reduces the additional new parameters introduced, making the model easier to train and improving both training and inference efficiency.

Experiments on extensive datasets show that mPLUG-Owl3 achieves SOTA results on most single-image multimodal benchmarks. In multi-image evaluations, it surpasses models specifically optimized for multi-image scenarios. On LongVideoBench, it outperforms existing models, demonstrating its exceptional capability in long video understanding.

The release of Alibaba's mPLUG-Owl3 is not only a leap forward in technology but also opens up new possibilities for the application of multimodal large models. As the technology continues to improve, we look forward to mPLUG-Owl3 bringing more surprises in the future.

Paper: https://arxiv.org/pdf/2408.04840

Code: https://github.com/X-PLUG/mPLUG-Owl/tree/main/mPLUG-Owl3

Live Demo: https://huggingface.co/spaces/mPLUG/mPLUG-Owl3

Multi-Modal Large Model mPLUG-Owl3 AI Assistant Information Processing

This article is from AIbase Daily

Welcome to the [AI Daily] column! This is your daily guide to exploring the world of artificial intelligence. Every day, we present you with hot topics in the AI field, focusing on developers, helping you understand technical trends, and learning about innovative AI product applications.

—— Created by the AIbase Daily Team

AI News Recommendations

Aliyun Qwen Launches New Domain Name qianwen.com, Offering More Model Options

On November 24, the Aliyun AI assistant Qwen launched a new domain name qianwen.com, ensuring consistent experience on both web and app platforms. Professional users can now access the Qwen3 series of models, such as Qwen3-Max-Thinking-Preview and Qwen3-Coder, among others, along with PC-optimized features for coding and in-depth research, enhancing accessibility and user experience.

Nov 24, 2025

PhysX-Anything: Single Image Generation of Simulatable 3D Assets: Explicitly Preserving Joints and Physical Parameters - Open Source

Nanyang Technological University and the Shanghai Artificial Intelligence Lab jointly launched the open-source framework PhysX-Anything, which can generate complete 3D assets including geometry, joints, and physical parameters from a single RGB image, directly usable for robot training. Technical highlights include: a coarse-to-fine process that first predicts overall physical properties before refining components; a new compressed 3D representation method to avoid physical distortion caused by visual bias.

Nov 24, 2025

110

JD.com Nanjing R&D Center Officially Starts Construction, Focusing on Developing AI and Autonomous Driving Technologies

JD.com's Nanjing R&D center, a 3.5B yuan project in Jianye District, adopts fast-track construction. It will serve as a regional HQ and research base focusing on AI, robotics, and autonomous driving, covering 274,000 sqm.....

Nov 24, 2025

110

Shanghai Cyberspace Administration Launches Crackdown, 54 Illegal AI Applications Removed, 3 Websites Penalized

The Shanghai Cyberspace Administration launched a special law enforcement campaign targeting the 'misuse of AI', aiming to address illegal activities related to generative AI. The operation focused on issues such as AI face-swapping and voice-changing, which pose threats to personal privacy and disrupt the online ecosystem. During the enforcement, some companies were found not to have developed and used AI technology in accordance with regulations, and it was emphasized that strengthening supervision is needed to prevent risks.

Nov 24, 2025

130

AI Daily: OpenAI Will Stop GPT-4o API Access; Xiaomi Opens Cross-Domain Embodied Large Model MiMo-Embodied; Lingguang Surpasses 2 Million Downloads in 6 Days

OpenAI discontinues GPT-4o API access for developers, urging migration. The model remains available for personal users but is no longer the default. Developers should explore alternatives.....

Nov 24, 2025

190

Tesla AI Chip Acceleration: AI5 to Begin Mass Production, AI6 Design Begins; Samsung and TSMC Advance in Parallel

Tesla partners with Samsung for a $16.4B AI chip deal to produce next-gen AI6 chips, with Musk confirming AI5 chips are near tape-out, highlighting Tesla's AI hardware strategy.....

Nov 24, 2025

130

Ubisoft Launches Its First Playable Generative AI Project Teammates to Explore New Gaming Experiences

Ubisoft's AI project 'Teammates' uses real-time voice commands in an FPS game, where players infiltrate enemy bases in a dystopian future, with AI assistant Jaspar enhancing immersion through natural speech interaction.....

Nov 24, 2025

130

Google Adds AI Image Generation Feature to Canary Version of Chrome for Android

Google introduces 'Nano Banana' AI image generation in Chrome Canary for Android, enabling users to create images directly from the address bar without extra apps.....

Nov 24, 2025

130

OpenAI Confirms: GPT-4o API Will Be Officially Discontinued on February 16, 2026, Leaving Developers Only Three Months to Migrate

OpenAI will retire GPT-4o API on Feb 16, 2026, keeping ChatGPT consumer services. Developers advised to migrate to enhanced GPT-5.1 series.....

Nov 24, 2025

180

White House Halts Federal Directives: State-Level AI Regulation Will Remain Effective

White House delays AI executive order to preempt state laws, centralizing regulation at federal level. Critics fear harm to consumer protection and state autonomy.....

Nov 24, 2025

150

Latest AI News

AI Daily Brief

AI Product Finder

AI Product Rankings

AI Product Submit

AI Tools Directory

AI Models Finder

LLM Leaderboard

Model Providers

Submit Your Model

Compare LLMs

LLM Cost Calculator

LLM Arena

MCP Servers

MCP Client

MCP Case Tutorials

MCP Ranking

MCP Service Submission

MCP Playground

MCP Inspector

AI Brand Monitoring Tool

GEO Services​

AI Search Visibility Checker

AI Model Compatibility Checker

AI Deployment Calculator

AI Dataset Collection

Intelligent Document Recognition

Alibaba Launches Multi-Modal Large Model mPLUG-Owl3: Watch a 2-Hour Movie in 4 Seconds

AIbase基地

This article is from AIbase Daily

AI News Recommendations

Aliyun Qwen Launches New Domain Name qianwen.com, Offering More Model Options

PhysX-Anything: Single Image Generation of Simulatable 3D Assets: Explicitly Preserving Joints and Physical Parameters - Open Source

JD.com Nanjing R&D Center Officially Starts Construction, Focusing on Developing AI and Autonomous Driving Technologies

Shanghai Cyberspace Administration Launches Crackdown, 54 Illegal AI Applications Removed, 3 Websites Penalized

AI Daily: OpenAI Will Stop GPT-4o API Access; Xiaomi Opens Cross-Domain Embodied Large Model MiMo-Embodied; Lingguang Surpasses 2 Million Downloads in 6 Days

Tesla AI Chip Acceleration: AI5 to Begin Mass Production, AI6 Design Begins; Samsung and TSMC Advance in Parallel

Ubisoft Launches Its First Playable Generative AI Project Teammates to Explore New Gaming Experiences

Google Adds AI Image Generation Feature to Canary Version of Chrome for Android

OpenAI Confirms: GPT-4o API Will Be Officially Discontinued on February 16, 2026, Leaving Developers Only Three Months to Migrate

White House Halts Federal Directives: State-Level AI Regulation Will Remain Effective

GEO Services