Welcome to the 【AI Daily】column! This is your daily guide to exploring the world of artificial intelligence. Every day, we present you with the hottest AI news, focusing on developers and helping you understand technological trends and innovative AI product applications.

Discover New AI Products Learn More: https://top.aibase.com/

1. Kimi Open-Source Vision-Language Models Kimi-VL and Kimi-VL-Thinking Surpass GPT-4o on Multiple Benchmarks

Moonshot AI recently open-sourced Kimi-VL and Kimi-VL-Thinking, two vision-language models demonstrating exceptional multi-modal understanding and reasoning capabilities. These models utilize a lightweight MoE architecture, boasting only 3 billion parameters, yet surpassing GPT-4o in several benchmark tests. The Kimi-VL series excels in mathematical reasoning, agent manipulation, and high-resolution image processing, supporting ultra-long context understanding and showcasing broad application potential.

image.png

【AiBase Summary:】

🛠️ Kimi-VL and Kimi-VL-Thinking employ a lightweight MoE architecture with only 3 billion parameters, resulting in high operational efficiency.

📊 In MathVision and ScreenSpot-Pro tests, Kimi-VL achieved outstanding scores of 36.8% and 34.5% respectively, demonstrating powerful reasoning capabilities.

📈 Supports context input up to 128K tokens, suitable for long documents and video analysis, showcasing broad application potential.

Details: https://github.com/MoonshotAI/Kimi-VL https://huggingface.co/moonshotai/Kimi-VL-A3B-Instruct

2. iFlytek's iFlytek Star Agent Development Platform Now Fully Supports MCP

iFlytek recently announced that its iFlytek Star Agent development platform now fully supports MCP, aiming to help developers efficiently build Agent applications. The platform not only supports easy configuration and invocation of industry-leading MCP Servers but also allows one-click deployment of custom MCP Servers, achieving true "plug-and-play." The first batch of supported MCP Servers covers multiple industries, promoting the standardization of the AI application middleware.

image.png

【AiBase Summary:】

🌟 Developers can easily configure and call industry-leading MCP Servers, supporting one-click deployment of custom MCP Servers.

🔧 The initial support includes 20+ industry-leading MCP Servers, covering AI capabilities and life services.

🌐 The iFlytek Star Agent development platform supports no-code and low-code creation modes, empowering individuals and businesses to quickly develop large-model applications.

Details: https://mcp.xfyun.cn/

3. Kunlun Wanwei Open-Sources Skywork-OR1 Series Models: Excellent in Mathematics and Code Capabilities

Kunlun Wanwei's Tiangong team released the upgraded Skywork-OR1 series models on April 13th, marking a significant breakthrough in logical reasoning and complex task solving. This series includes three high-performance models, specifically for mathematics and code domains, demonstrating exceptional reasoning capabilities and cost-effectiveness. Skywork-OR1-32B-Preview is particularly outstanding in competitive programming tasks, showcasing the advancement of its training strategy.

image.png

【AiBase Summary:】

🔍 The Skywork-OR1 series models have achieved industry-leading reasoning performance in logical understanding and complex task solving.

💻 Includes three high-performance models: Skywork-OR1-Math-7B, Skywork-OR1-7B-Preview, and Skywork-OR1-32B-Preview, catering to different needs.

🏆 Skywork-OR1-32B-Preview stands out in competitive programming tasks, approaching the capabilities of DeepSeek-R1, demonstrating superior cost-effectiveness.

Details: https://github.com/SkyworkAI/Skywork-OR1

4. ByteDance Launches Seed-Thinking-v1.5: A New Force in Reasoning AI Competitions

ByteDance's new large language model, Seed-Thinking-v1.5, demonstrates strong capabilities in reasoning AI competitions. This model employs a Mixture of Experts architecture, surpassing industry giants in several benchmark tests, particularly in science, technology, mathematics, and engineering. Through technological innovation and efficient training methods, Seed-Thinking-v1.5 not only improves reasoning capabilities but also performs excellently in non-reasoning tasks.

image.png

【AiBase Summary:】

🚀 ByteDance launches Seed-Thinking-v1.5, focusing on STEM fields and employing a Mixture of Experts architecture.

🏆 Excels in multiple benchmark tests, surpassing products from Google and OpenAI.

🔍 Utilizes advanced training techniques and reinforcement learning frameworks to enhance model performance and efficiency.

5. SenseTime's SenseCore 2.0 Receives a Major Upgrade, Launches a 100 Million Yuan Voucher Program

At the 2025 SenseTime Technology Exchange Day, SenseTime announced a comprehensive upgrade to its SenseCore 2.0 large-scale AI infrastructure, aiming to provide businesses with efficient and flexible full-stack AI infrastructure services. This upgrade addresses three major challenges in the large model industry and significantly improves computing power utilization and reasoning performance through technological innovation. Furthermore, SenseTime has invested 100 million yuan in special vouchers to help various industries accelerate AI implementation.

image.png

【AiBase Summary:】

⚙️ SenseCore 2.0 is comprehensively upgraded, enhancing the cost-effectiveness and flexibility of AI infrastructure services.

🤝 SenseTime and Songying Technology have formed a strategic partnership to promote the development of embodied intelligence technology and solve the challenges of intelligent implementation.

💰 Investing 100 million yuan in vouchers to support businesses with full-process AI services, from consulting to model training.

6. Google AI Studio Opens Limited Free Trial of Veo 2 Video Model

Google AI Studio recently opened a limited free trial of its Veo 2 video model to select users, generating considerable interest. Veo 2, the latest generation of AI video generation tools, supports up to 4K resolution and realistic physical simulations, showcasing its powerful technical capabilities. However, trial access is strictly limited, leaving users uncertain about cooldown periods and future usage.

image.png

【AiBase Summary:】

🌟 The Veo 2 video model, developed by Google DeepMind, supports up to 4K resolution, showcasing exceptional generative capabilities.

🕒 Trial access is limited, with users reporting unclear cooldown times, potentially impacting the experience.

🔒 Google strictly controls generated content to ensure user privacy and security.

7. Shanghai AI Lab Open-Sources InternVL3 Series Multi-modal Large Language Models

OpenGVLab released the InternVL3 series models on April 11th, marking a new milestone in the field of multi-modal large language models. This series includes models of various sizes, from 1B to 78B parameters, capable of processing text, images, and videos, with significantly improved performance. Compared to its predecessors, InternVL3 has made significant advancements in multi-modal perception and reasoning, expanding capabilities in tool usage, industrial image analysis, and more.

image.png

【AiBase Summary:】

🧠 The InternVL3 series models support various sizes from 1B to 78B parameters, demonstrating exceptional multi-modal processing capabilities.

🔍 Compared to InternVL2.5, InternVL3 shows significant improvements in multi-modal perception and reasoning, supporting multiple images and video data.

⚙️ The model can be deployed as an OpenAI-compatible API via LMDeploy's api_server, allowing users to easily call the model.

Details: https://modelscope.cn/collections/InternVL3-5d0bdc54b7d84e

8. Revolutionizing AI "IQ" Testing! The New GAIA Benchmark Surpasses ARC-AGI

With the rapid development of AI technology, accurately evaluating the intelligence level of AI has become a key industry concern. Traditional evaluation benchmarks like MMLU, while widely used, are increasingly showing limitations, failing to fully reflect AI capabilities in real-world applications. The newly introduced GAIA benchmark simulates complex real-world problems, emphasizing AI's flexibility and specialization in multi-step tasks, marking a significant shift in AI evaluation methods.

image.png

【AiBase Summary:】

🔍 The new GAIA benchmark aims to evaluate AI capabilities in real-world applications, covering key skills such as multi-modal understanding and complex reasoning.

📊 High scores on traditional benchmarks like MMLU do not necessarily reflect AI's true capabilities; performance differences in real-world applications are significant.

🚀 Initial results from the GAIA benchmark show that flexible models outperform other well-known models in complex tasks.

Details: https://huggingface.co/gaia-benchmark

9. A Hundred-Dollar Open-Source Video Model, Pusa: High-Quality Reproduction at Low Cost Based on Mochi Fine-tuning

Pusa is an open-source video generation model based on Mochi fine-tuning, featuring low cost and complete open-source availability. With a training cost of approximately $100, Pusa demonstrates good video generation capabilities, supporting various generation tasks. Its open fine-tuning process promotes community collaboration and development, attracting more researchers to participate in video model research.

image.png

【AiBase Summary:】

💰 The training cost of the Pusa model is only $100, significantly lower than the tens of thousands or even hundreds of thousands of dollars for traditional large video models.

🔧 Pusa is fully open-source, providing a complete codebase and training methods, allowing researchers to reproduce experiments and conduct innovation.

🎬 Based on Mochi fine-tuning, Pusa supports various video generation tasks. Although the current resolution is 480p, it shows potential in motion fidelity and prompt adherence.

Details: https://top.aibase.com/tool/pusa

10. ByteDance's Open-Source Project UNO: Image Generation Maintaining Character and Object Consistency

ByteDance's open-source project UNO has made significant breakthroughs in AI image generation, solving the problem of inconsistent characters or objects in generated images. Through innovative high-consistency data synthesis processes and model design, UNO ensures that generated images maintain consistent characteristics, whether in single-subject or multi-subject scenarios.

image.png

【AiBase Summary:】

🧠 The UNO project aims to solve the problem of character consistency in AI image generation, avoiding "facial blindness."

🔍 Using a high-consistency data synthesis process and innovative model design, UNO improves the controllability of image generation.

🎨 Supports both single-subject and multi-subject scenarios, ensuring high consistency in the generated results.

Details: https://huggingface.co/bytedance-research/UNO

11. XPeng Motors Unveils New Physics Large Model, Positioning Itself as an AI Automotive Company

XPeng Motors founder He Xiaopeng emphasized the company's positioning as an AI automotive company on social media, believing that the greatest value of artificial intelligence lies in transforming the physical world. He revealed XPeng's innovative technologies in autonomous driving, particularly reinforcement learning and model distillation, giving it a unique competitive advantage in the industry. Furthermore, XPeng is training an ultra-large-scale physical world model, signifying its leading position in AI technology application.

image.png

【AiBase Summary:】

🤖 XPeng Motors positions itself as an AI automotive company, emphasizing the application value of AI technology in the physical world.

🚀 Introducing reinforcement learning and model distillation technologies to enhance competitiveness in the autonomous driving field.

📅 The 2025 press conference will clarify XPeng's future development direction and launch the new X9 model.

12. ByteDance's Foray into AI Smart Glasses, Challenging the Next Generation of Wearable Device Market

ByteDance is actively developing an AI smart glasses product, aiming to combine advanced AI functions with high-quality image capture to deliver an innovative user experience. This device will integrate ByteDance's self-developed "Doubao" AI model, enhancing intelligent interaction capabilities. Users can interact with the glasses through voice commands and other methods. The project has entered the substantive research and development phase, and ByteDance is communicating with supply chain partners to promote product design and launch plans.image.png

【AiBase Summary:】

🧠 ByteDance is developing AI smart glasses, integrating advanced AI functions and image capture.

🔍 Integrating the "Doubao" AI model, supporting voice commands, real-time translation, and other intelligent interactions.

📈 Planning to communicate with supply chain partners to promote product design and launch, challenging competitors such as Meta.