AI Daily: ByteDance Releases Doubao 1.5 Deep Thinking Model; WeChat Launches Yuanbao, its First AI Assistant; OpenAI Releases o4-mini and a Full-Blooded o3

Welcome to the 【AI Daily】column! This is your daily guide to exploring the world of artificial intelligence. Every day, we present you with the hottest AI news, focusing on developers and helping you understand technology trends and innovative AI applications.

Explore New AI Products Learn More: https://top.aibase.com/

1. OpenAI Releases Two Multimodal Reasoning Models: o4-mini and the Full-Blooded o3

OpenAI unveiled its latest multimodal models, o4-mini and the full-fledged o3, during a technical livestream. These models can simultaneously process text, images, and audio, and can call external tools to handle complex tasks. o4-mini excelled in various tests, achieving higher accuracy than o3 and ranking among the top performers in programming competitions.

【AiBase Summary:】
🛠️ o4-mini and o3 possess multimodal processing capabilities, handling text, images, and audio simultaneously, and automatically invoking external tools.
📊 o4-mini achieved accuracy rates of 93.4% and 92.7% in AIME2024 and 2025 tests, respectively, surpassing the full-blooded o3.
💻 In programming competitions, o4-mini scored 2700 points, placing it among the top 200 programmers globally, demonstrating its powerful programming capabilities.

2. WeChat's First AI Assistant, "Yuanbao," Officially Launches; Addable as a WeChat Friend

Tencent's "Yuanbao" is the first AI assistant to run on the WeChat platform. Users can directly search and add it as a friend within WeChat for a more natural chat experience. Yuanbao can not only parse WeChat official account articles, images, and documents but also engage in intelligent interactions and answer follow-up questions from users. This assistant prioritizes user privacy and features automatic redaction of ID photos, although voice or video calls are not currently supported.

【AiBase Summary:】
🌟 WeChat's first AI assistant, "Yuanbao," is launched, allowing users to add it directly within WeChat.
📊 Yuanbao supports parsing official account articles, images, and documents, providing intelligent interaction.
🔒 It features privacy protection, including automatic redaction of ID photos.

3. ByteDance Releases Doubao 1.5 Deep Thinking Model: Multimodal Deep Thinking, Low Latency

At the April 17th Volcano Engine AI Innovation Tour in Hangzhou, ByteDance released the Doubao 1.5 deep thinking model, showcasing its outstanding capabilities in mathematics, programming, scientific reasoning, and creative writing. This model uses a MoE architecture, boasts superior parameter configuration, and has low inference costs. Combined with visual understanding technology, the model can analyze photos, assist with travel and project management, and significantly enhances video search capabilities, improving user access to information.

【AiBase Summary:】
📈 The Doubao 1.5 model excels in mathematics, programming, and other fields, utilizing a MoE architecture with superior parameter configuration.
🌍 The new model incorporates visual understanding technology, capable of analyzing photos and assisting with travel and project management, offering powerful functionalities.
🎥 Video search capabilities are significantly enhanced, allowing users to quickly access relevant information within videos, leading to increased usage.

4. Moon's Dark Side Kimi Open-Sources Mathematical Theorem Proving Model Kimina-Prover

The Kimi technical team released a preview version of Kimina-Prover, open-sourcing multiple models and datasets, demonstrating outstanding performance in formal theorem proving. By combining large-scale reinforcement learning with formal reasoning, Kimina-Prover significantly improves the model's reasoning ability and sample efficiency, achieving an 80.7% pass rate, surpassing previous best results.

【AiBase Summary:】
🔍 Kimina-Prover achieved an 80.7% pass rate on the miniF2F benchmark, exceeding previous best results.
🚀 This model combines large-scale reinforcement learning with formal reasoning, significantly improving reasoning ability and sample efficiency.
📚 Kimina-Prover offers strong explainability; users can view the derivation process, facilitating understanding of model behavior.
Details: https://arxiv.org/abs/2504.11354

5. OpenAI Open-Sources Super Agent: Codex CLI, Surpasses 5000 Stars in Five Hours

OpenAI recently released Codex CLI, a lightweight code intelligence tool that quickly gained significant attention, surpassing 5000 stars in just five hours and projected to reach 10,000 stars within the day. Codex CLI boasts powerful features such as automatic code generation, code execution, refactoring, and testing, significantly improving developer productivity.

【AiBase Summary:】
🌟 Codex CLI received 5000 stars in just 5 hours after its release, projected to surpass 10,000 stars today.
💻 This tool can automatically generate, execute, refactor, and test code, offering powerful and practical functionalities.
📈 OpenAI plans to continuously launch more intelligent agent products and explore acquiring AI programming platforms to enhance its competitiveness.
Details: https://github.com/openai/codex?tab=readme-ov-file

6. Google Gemini Live Feature Fully Opens, Bringing New Experiences to Android Users

Google recently announced that it's making the Gemini Live feature in its Gemini app freely available to all Android users. Previously, this feature was limited to Pixel 9 and Samsung Galaxy S25 users. Gemini Live's strength lies in its ability to identify content on the camera and screen in real-time, providing users with instant feedback and information, greatly enhancing the interactive experience. Due to positive user feedback, Google decided to expand this feature, expecting full rollout within the next few weeks.

【AiBase Summary:】
🌟 Gemini Live is now freely available to all Android users, previously limited to Pixel 9 and Galaxy S25 users.
📸 This feature can identify camera and screen content in real-time, providing instant information and feedback, improving user interaction.
🚀 Microsoft launched a similar AI tool, Copilot Vision, on the same day, demonstrating the rapid advancement of real-time information recognition technology.

7. OpenAI Plans to Acquire AI Programming Tool Windsurf for $3 Billion

OpenAI is in acquisition talks with AI programming tool Windsurf, with a deal valued at approximately $3 billion. This acquisition would be OpenAI's largest merger and acquisition deal, signifying a significant move in the AI developer tool market. Windsurf is a popular AI programming assistant capable of generating and interpreting code and has already secured over $200 million in funding.

【AiBase Summary:】
💰 OpenAI is in $3 billion acquisition talks with Windsurf, which would be its largest M&A deal if completed.
🚀 Windsurf is a popular AI programming assistant supporting code generation and interpretation, having secured over $200 million in funding.
📈 This acquisition would enhance OpenAI's programming capabilities, helping it maintain a leading position in the fiercely competitive AI tool market.

8. JetBrains Launches Coding Intelligence Agent Junie AI, Powering a New Programming and Debugging Experience

JetBrains recently announced that its new coding intelligence agent, Junie AI, has reached production readiness, aiming to help developers write and debug code more efficiently. The launch of Junie AI marks a significant advancement for JetBrains in the AI tool space. Additionally, JetBrains has updated its older AI assistant, supporting the latest AI models and enhancing user experience. To address market competition, JetBrains plans to introduce a free plan to attract more developers to use its tools.

【AiBase Summary:】
🤖 Junie AI is production-ready, focusing on handling complex tasks and debugging.
📈 The updated AI assistant supports various latest AI models and adds multi-file editing functionality.
🌐 JetBrains will launch a free plan, offering unlimited code completion to meet the needs of different developers.
Details: https://blog.jetbrains.com/blog/2025/04/16/jetbrains-ides-go-ai/

9. Reachy2 Open-Source Humanoid Robot Officially Goes on Sale

Pollen Robotics' Reachy2, an open-source humanoid robot priced at $70,000, has already been adopted by several top universities and research institutions. Its modular design and powerful AI-driven capabilities make it a pioneer in the humanoid robot field, suitable for various research and educational scenarios. Reachy2's open-source nature and flexible programming support provide developers with ample room for innovation, driving advancements in robotics technology.

【AiBase Summary:】
🤖 Highly anthropomorphic design, featuring 7-DOF arms, capable of performing natural and precise movements, suitable for various applications.
🔄 Modular and open-source architecture, supporting Python SDK programming, allowing developers to expand functionality based on their needs and drive technological innovation.
🌍 Deployed in over 20 countries globally, with clients including renowned institutions, showcasing its broad application potential in healthcare, retail, and education.

10. Shanghai Artificial Intelligence Laboratory Launches Upgraded Multimodal Large Model "Shusheng・Wanxiang 3.0"

The Shanghai Artificial Intelligence Laboratory's "Shusheng・Wanxiang 3.0" is a new multimodal large model with enhanced text and multimodal input processing capabilities, demonstrating excellent performance. This model shows significant improvements in performance and user experience, offering faster response speeds, stronger comprehension, and the ability to meet diverse user needs.

【AiBase Summary:】
🚀 The upgraded "Shusheng・Wanxiang 3.0" shows significant improvements in multimodal processing capabilities, suitable for various applications.
💡 This model shows noticeable advancements in performance and user experience, with enhanced response speed and comprehension.
🌐 Open-source initiatives provide developers with a new platform, encouraging innovation and application, and driving industry development.

11. Doubao Deep Thinking and Text-to-Image 3.0 Models Officially Open APIs to Enterprise Customers

Doubao recently released the Doubao 1.5 Deep Thinking model and Doubao Text-to-Image model 3.0, officially opening APIs through Volcano Engine for developers and enterprise customers. These two models demonstrate exceptional performance in inference and image generation tasks, driving the application and development of AI technology. The deep thinking model excels in professional reasoning tasks, while the text-to-image model shows significant improvements in image generation quality.

【AiBase Summary:】
🧠 Doubao 1.5 Deep Thinking model performs exceptionally well in professional reasoning tasks, approaching the global top tier.
🎨 Doubao Text-to-Image model 3.0 achieves high-resolution image generation, improving creative efficiency and possessing commercial-grade design capabilities.
🚀 The open APIs of these two models provide enterprise customers with more efficient and versatile inference and image generation capabilities, driving AI technology development.
Details: https://github.com/ByteDance-Seed/Seed-Thinking-v1.5