AI News

Don't miss any moment of global AI innovation

AI Daily

Daily three-minute AI industry trends

AI Timeline

AI industry milestones

Al Hardware

Lists all AI hardware products.

AI Monetization Guide

Latest Cases

AI monetization case sharing

Image Collection

AI image creation monetization cases

Video Collection

AI video creation monetization cases

Audio Collection

AI audio creation monetization cases

Content Collection

AI content writing monetization cases

AI Tutorials

Latest Tutorials

Free sharing of the latest AI tutorials

AI Product Rankings

AI Product Ranking

Shows total visits ranking of AI websites

AI Traffic Growth Ranking

Track fastest growing AI websites by traffic

AI Traffic Decline Ranking

Focus on AI websites with significant traffic drops

AI Weekly Ranking

Shows weekly visits ranking of AI websites

Popular Country Rankings

United States

AI websites most popular with US users

China

AI websites most popular with Chinese users

India

AI websites most popular with Indian users

Brazil

AI websites most popular with Brazilian users

Popular Category Rankings

Image Generation

Total visits ranking of AI image generation websites

Personal Assistant

Total visits ranking of AI personal assistant websites

Character Generation

Total visits ranking of AI character generation websites

Video Generation

Total visits ranking of AI video generation websites

Popular Open Source Data Rankings

AI Project Ranking

GitHub popular AI projects by total stars

AI Project Growth Ranking

GitHub popular AI projects by growth rate

AI Developer Ranking

GitHub popular AI developer ranking

AI Organization Ranking

GitHub popular AI organization ranking

Popular Open Source Categories

Deepseek

GitHub popular deepseek open source projects

TTS

GitHub popular TTS open source projects

LLM

GitHub popular LLM open source projects

ChatGPT

GitHub popular ChatGPT open source projects

AI Open Source Project Library

Overview

Overview of GitHub popular AI open source projects

Product Library Tool Navigation MCP

Tables and Charts All Handled! Alibaba DAMO Academy Open Sources DocOwl 1.5, Efficiently 'Understanding' Documents Without OCR!

AIbase基地

Published inAI News · 4 min read · Oct 21, 2024

446

Alibaba DAMO Academy and Renmin University of China have recently jointly open-sourced a document processing model named mPLUG-DocOwl1.5, which is designed to understand document content without the need for OCR recognition and has achieved leading performance in multiple visual document understanding benchmark tests.

Structural information is crucial for understanding the semantics of rich text images, such as documents, tables, and charts. While existing multimodal large language models (MLLMs) possess text recognition capabilities, they lack a general structural understanding of rich text document images. To address this issue, mPLUG-DocOwl1.5 emphasizes the importance of structural information in visual document understanding and proposes "unified structural learning" to enhance the performance of MLLMs.

The model's "unified structural learning" covers five domains: documents, web pages, tables, charts, and natural images, including structural-aware parsing tasks and multi-granularity text localization tasks. To better encode structural information, researchers have designed a simple yet effective visual-to-text module called H-Reducer, which not only preserves layout information but also reduces the length of visual features by merging horizontally adjacent image patches through convolution, enabling large language models to more effectively understand high-resolution images.

Additionally, to support structural learning, the research team has constructed a comprehensive training set, DocStruct4M, containing 4 million samples based on publicly available datasets, including structural-aware text sequences and multi-granularity text bounding boxes. To further enhance the reasoning capabilities of MLLMs in the document domain, they have also built a reasoning fine-tuning dataset, DocReason25K, with 25,000 high-quality samples.

mPLUG-DocOwl1.5 employs a two-stage training framework, starting with unified structural learning followed by multi-task fine-tuning across multiple downstream tasks. Through this training approach, mPLUG-DocOwl1.5 has achieved state-of-the-art performance in 10 visual document understanding benchmarks, improving the SOTA performance of the 7B LLM by more than 10 percentage points in 5 benchmarks.

Currently, the code, models, and datasets for mPLUG-DocOwl1.5 have been publicly released on GitHub.

Project Address: https://github.com/X-PLUG/mPLUG-DocOwl/tree/main/DocOwl1.5

Paper Address: https://arxiv.org/pdf/2403.12895

mPLUG-DocOwl1.5 AlibabbaDaemon PekingUniversity VisualDocumentAnalysis

This article is from AIbase Daily

Welcome to the [AI Daily] column! This is your daily guide to exploring the world of artificial intelligence. Every day, we present you with hot topics in the AI field, focusing on developers, helping you understand technical trends, and learning about innovative AI product applications.

—— Created by the AIbase Daily Team

AI News Recommendations

Design Giant Figma's IPO is Imminent: Financial Data Revealed, Valuation May Reach $1.5 Billion!

Jul 2, 2025

210

Artificial Intelligence Helps: Nestlé Pilot Project Expected to Save 1.5 Million Meals from Food Waste

May 28, 2025

540

Google CEO Announces: Gemini AI Monthly Active Users Exceed 400 Million, Reaching 1.5 Billion Global Users

May 22, 2025

250

Nvidia introduces new humanoid robot model to drive the next chapter of the industrial revolution

May 20, 2025

950

Only 20B parameters! ByteDance releases Seed1.5-VL multimodal model, achieving SOTA in 38 tasks

May 14, 2025

680

Manus parent company Butterfly Effect plans to raise $100 million with a valuation of $1.5 billion

According to Science and Technology Board News, the parent company of the intelligent product Manus, Butterfly Effect, is planning a new round of financing with a target amount of $100 million and an evaluation value of $1.5 billion. According to insiders, state-owned background funds will participate in this round of financing, and the funds raised will mainly be used to expand business in the Chinese market. This move shows Butterfly Effect's emphasis on the Chinese market prospects, and also indicates that the AI intelligent product Manus will accelerate its development domestically.

May 14, 2025

440

Alphabet Q1 Earnings Exceed Expectations; Announces $70 Billion Stock Buyback, AI Overview Reaches 1.5 Billion Monthly Active Users

Apr 27, 2025

390

Doubao 1.5 Deep Thinking Model Launches on Edge Large Model Gateway with Free Million Tokens

Bytedance's Volcano Engine announced the full launch of its newly released Doubao 1.5 Deep Thinking model on the edge large model gateway, offering users up to 5 million free tokens. This move has garnered significant attention in the AI field.

Apr 25, 2025

1.7k

Google AI Overview: Over 1.5 Billion Monthly Users, Intelligent Tools Continuously Upgrading

Apr 25, 2025

340

AI Daily: Alibaba's Tongyi Wanxiang First and Last Frame Video Generation Model; Doubao Open-Sources Seed Agent Model UI-TARS-1.5; OpenAI Releases First Intelligent Agent Practice Guide

Apr 18, 2025

820