Welcome to the 【AI Daily】column! This is your daily guide to exploring the world of artificial intelligence. Every day, we present you with the hottest topics in the AI field, focusing on developers and helping you understand technology trends and innovative AI product applications.

New AI Products Learn More: https://top.aibase.com/

1. Alibaba's Tongyi Open-Source R1-Omni Model Enhances Multimodal Emotion Recognition

On March 11th, the Tongyi Lab team open-sourced the R1-Omni model, marking significant progress in multimodal model development. By combining reinforcement learning with verifiable rewards, this model significantly improves the reasoning ability and generalization performance of multimodal emotion recognition. The training process is divided into cold start and RLVR phases, ensuring the model's stability and efficiency in multimodal emotion recognition tasks.

image.png

【AiBase Summary:】

🎥 The R1-Omni model combines reinforcement learning and verifiable rewards, focusing on improving the reasoning ability of multimodal emotion recognition.

📊 In the cold start phase, the model is fine-tuned using 580 video data points, laying the foundation for subsequent training.

🌟 Experimental results show that R1-Omni surpasses baseline models by over 35% on multiple test sets, demonstrating excellent generalization capabilities.

Details: https://arxiv.org/abs/2503.05379

2. OpenAI Launches New Tools to Help AI Agents Transition from 'Answering Questions' to 'Executing Tasks'

OpenAI recently released a suite of new tools designed to simplify the development process and enhance the capabilities of AI agents. These tools include the Responses API, Agents SDK, and computer usage tools, marking a shift from simply answering questions to actually executing tasks. The introduction of these new tools will significantly improve AI's real-world applicability, providing developers with stronger support and playing a significant role in future technological advancements.

image.png

【AiBase Summary:】

🔄 The newly launched Responses API combines chat functionality with various integrated tools, providing real-time information and source citations to enhance development flexibility.

🔧 The Agents SDK, an open-source framework, coordinates complex workflows between multiple agents, improving information retrieval efficiency.

💻 Computer usage tools enable AI to execute tasks directly on a computer, marking a significant upgrade in AI capabilities.

3. Baidu AI Open-Sources Table Recognition Model PP-TableMagic

On March 11th, Baidu AI launched PP-TableMagic, an open-source table recognition solution, marking significant progress in the field of structured information extraction from tables. This technology, through an innovative multi-model network architecture, overcomes the limitations of traditional table recognition in complex scenarios, achieving high-precision end-to-end table recognition and supporting highly customizable model fine-tuning. PP-TableMagic's design allows it to efficiently process various table data, significantly improving document intelligence understanding and data analysis capabilities, meeting the needs of the digital age.

image.png

【AiBase Summary:】

🛠️ PP-TableMagic uses a multi-model cascade architecture to improve the accuracy and adaptability of table recognition.

📈 The model supports customized fine-tuning to meet the needs of different scenarios and reduce data annotation workload.

💻 Provides detailed installation guides and usage tutorials, supporting high-performance inference and service deployment.

Details: https://github.com/PaddlePaddle/PaddleX/blob/release/3.0-rc/docs/pipeline_usage/tutorials/ocr_pipelines/table_recognition_v2.md

4. Manus Partners with Alibaba Cloud's Tongyi Qianwen to Promote Domestic AI Agent Products

Manus, a rising star in AI Agent products, has formed a strategic partnership with Alibaba Cloud's large language model, Tongyi Qianwen. Both parties will utilize the Tongyi Qianwen series of open-source models to implement all of Manus's functions on domestic models and computing platforms. This move aims to create more creative general-purpose agent products for Chinese users. Although Manus encountered some challenges after its release, its early preview version demonstrated the ability to automatically execute complex tasks, marking progress in domestic AI technology.

image.png

【AiBase Summary:】

🤖 Manus and Alibaba Cloud's Tongyi Qianwen have formed a strategic partnership to advance the development of domestic AI agent products.

🌐 Both parties will use the Tongyi Qianwen open-source model to implement all of Manus's functions, improving user experience.

📈 Manus has demonstrated the ability to automatically execute complex tasks, marking the release of the world's first general-purpose intelligent agent product.

5. Beyond Flat Images! MIDI: Extract Image Elements to Generate 360-degree 3D Scenes

The advent of MIDI technology has made it possible to generate 360-degree 3D scenes from a single 2D image. Through intelligent segmentation and multi-instance synchronous diffusion, MIDI can efficiently construct detailed 3D environments, greatly improving content creation efficiency in fields such as virtual reality, game development, and interior design. In the future, users will only need to take a photo to quickly generate an interactive 3D scene, truly realizing the dream of "one-click travel."

image.png

【AiBase Summary:】

🖥️ MIDI uses intelligent segmentation technology to identify and extract individual elements from 2D images, providing the basis for 3D scene construction.

🎶 Using multi-instance synchronous diffusion, MIDI can model multiple objects simultaneously, improving the efficiency and coordination of 3D generation.

🌍 MIDI demonstrates strong generalization capabilities with limited data, generating 3D scenes with fine textures and realistic effects.

Details: https://huanngzh.github.io/MIDI-Page/

6. VideoPainter: Local Video Editing Technology - Automatic Recognition and Modification with Prompts, Supporting Long Videos

VideoPainter is a deep learning-based video editing tool that can automatically identify and modify video content using simple prompts, especially suitable for long-form video processing. Users only need to enter short instructions, and the system can quickly complete the editing, greatly improving video production efficiency. Its underlying Diffusion Transformer model makes the editing process more precise, allowing users to easily achieve creative transformations and truly changing the rules of the video editing game.

image.png

【AiBase Summary:】

✨ Using simple prompts, VideoPainter can automatically identify and modify video content, improving editing efficiency.

🎬 Suitable for processing long videos, users can quickly locate and modify specific segments, avoiding cumbersome traditional editing workflows.

🚀 Based on the advanced DiT model, VideoPainter provides high accuracy and flexibility, making it easy to turn creativity into reality.

Details: https://yxbian23.github.io/project/video-painter/

7. The Open-Source OpenAI Operator Has Arrived! Nanobrowser's Free AI Automation Superhero

Nanobrowser is a completely free, open-source tool designed to provide users with efficient web automation capabilities while ensuring data security and privacy. Users only need to install the extension and configure their own LLM API key to enjoy a top-tier automated experience. Compared to traditional RPA tools, Nanobrowser, with its intuitive interface and multi-agent system, makes it easy for even novice users to get started.

image.png

【AiBase Summary:】

💰 Nanobrowser is a completely free, open-source tool with no subscription fees; users can independently configure their LLM API keys.

🔒 All operations are performed in the local browser, ensuring user privacy and data security and preventing sensitive information leakage.

🤖 Supports mainstream AI models such as OpenAI, Anthropic, and Google, providing an intuitive interface suitable for users of all levels.

Details: https://github.com/nanobrowser/nanobrowser

8. Luma AI's Open-Source Image Pre-training Technology IMM Achieves Tenfold Speed Increase in Image Generation

Luma AI's recently open-sourced Inductive Moment Matching (IMM) technology significantly improves the speed and quality of image generation. Through innovative pre-training algorithms, IMM can achieve flexible jumps during inference, reducing the number of generation steps and thus breaking through the bottleneck of generative pre-training. Experimental results show that IMM demonstrates excellent performance on multiple datasets, marking a new future for multimodal foundation models.

image.png

【AiBase Summary:】

⚡ IMM technology significantly improves inference efficiency through reverse-designed pre-training algorithms.

🏆 On ImageNet and CIFAR-10 datasets, IMM achieves unprecedented high-quality generation.

🔧 IMM training is highly stable and adaptable, breaking the limitations of traditional models.

Details: https://github.com/lumalabs/imm

9. Former ByteDance AI Executive Luo Yihang Joins Shengshu Technology as CEO to Advance Commercialization of AI Video Generation

Luo Yihang's joining marks a new phase for Shengshu Technology in the field of AI video generation. His extensive experience and technical background will contribute to the company's further development in multimodal technology, especially in the commercialization process of video generation. The collaboration between Shengshu Technology's founder Zhu Jun and Luo Yihang indicates that more innovative products will be launched in the future, driving the development of the entire industry.

image.png

【AiBase Summary:】

👤 Luo Yihang, as the new CEO, will be fully responsible for Shengshu Technology's R&D and commercialization processes.

📈 His successful experience at ByteDance, especially in the management of AI product lines, brings strong technical support to Shengshu Technology.

🎥 Shengshu Technology's upcoming Vidu 2.0 will significantly improve video generation efficiency, reduce costs, and drive industry development.

10. Second Nationwide Ruling on AI Copyright Case: Court Confirms Author's Copyright

On March 7th, the Changshu People's Court of Suzhou City, Jiangsu Province, ruled on a highly publicized copyright dispute case involving AI-generated content, marking the first such case in Jiangsu Province and the second nationwide. The court confirmed that Lin Mou holds the copyright to the images generated using the Midjourney software, stating that the creation process was original and met the requirements of copyright law protection.

image.png

【AiBase Summary:】

🌟 The first AI copyright dispute case in Jiangsu Province was ruled on, and the court confirmed the author's copyright.

🖼️ The Changshu People's Court held that Lin Mou's creation of the work was original and constituted copyright protection.

💰 The court ordered the infringing party to publicly apologize and pay 10,000 yuan in compensation. No appeal was filed, and the judgment took effect.

11. Rebirth: I'm the Boss of AI on Xiaohongshu; Yuanbao Keeps Crashing, DeepSeek Is Always Slacking Off

On Xiaohongshu, netizens have transformed into "tycoon bosses" of AI companies, jokingly commanding various AI models in group chats, creating a humorous AI workplace farce. Initially started by netizen Komorebi, it has since gained widespread participation, with people sharing funny interactions with their AI employees. Although the current BotGroup platform's functionality is still rudimentary, its novel gameplay and anthropomorphic AI performance are hilarious and have become a new social media hotspot.

image.png

【AiBase Summary:】

🤖 Netizens on Xiaohongshu have become "tycoon bosses" of AI companies, jokingly commanding AI models, creating a humorous workplace farce.

💼 The BotGroup web application allows different AI models to enter the same group chat, where users can interact with AI and participate in various games, experiencing the fun of being a boss.

😂 Despite its rudimentary functionality, the funny performances and interactions of the AI employees have attracted many netizens, becoming a new social media hotspot.

12. Report: Meta Starts Testing Self-Developed Chips for AI Training, Reducing Reliance on Nvidia

Meta is testing a self-developed AI training chip aimed at reducing its reliance on hardware manufacturers like Nvidia. This chip, produced in collaboration with Taiwan Semiconductor Manufacturing Company (TSMC), is specifically designed for AI workloads. Meta hopes this technology will improve its autonomy, lower costs, and enhance its competitiveness in the AI field.

image.png

【AiBase Summary:】

✨ Meta is testing its self-developed AI training chip to reduce its reliance on Nvidia.

💡 The chip is manufactured in collaboration with Taiwan's TSMC and is specifically designed to handle AI workloads.

💰 Meta is expected to spend $65 billion this year; switching to its own chips would save a significant amount of money.