Welcome to the 【AI Daily】column! Your daily guide to exploring the world of artificial intelligence. We present the hottest AI news every day, focusing on developers and helping you understand technology trends and innovative AI product applications.

Check out the latest AI products Learn More: https://top.aibase.com/

1. Alibaba's Tongyi Wanxiang's First and Last Frame Video Generation Model Wan2.1-FLF2V-14B Open-Sourced

Alibaba's Tongyi Lab has open-sourced the Wan2.1-FLF2V-14B model on Hugging Face and GitHub, marking a significant advancement in AI video generation technology. The model supports high-definition video generation and creates smooth animation transitions using only the first and last frames provided by the user. It features various functions such as text-to-video and video editing. The open-source nature lowers the technical barrier, attracting developer attention and promoting widespread application of AI video creation.

image.png

【AiBase Summary:】

📸 Supports first and last frame control; users only need to provide two images to generate a smooth 5-second, 720p HD video.

🚀 The model has multi-modal support. Besides video generation, it can also generate images and audio guided by text, expanding creative scenarios.

🌐 The open-source ecosystem promotes developer participation, and Alibaba's free trial further stimulates community feedback and optimization.

Details: https://github.com/Wan-Video/Wan2.1

2. ByteDance Open-Sources Seed Intelligent Agent Model UI-TARS-1.5

ByteDance's UI-TARS-1.5 model has made significant progress in the field of multi-modal intelligent agents, particularly in GUI operation and game reasoning. The model enhances high-level reasoning capabilities through reinforcement learning, demonstrating superior performance in complex tasks. The open-sourced UI-TARS-1.5 provides developers with powerful tools, driving the development of multi-modal intelligent agent technology. Future optimizations aim to approach human-level performance.

image.png

【AiBase Summary:】

🖥️ UI-TARS-1.5 achieved SOTA performance in 7 GUI evaluation benchmarks, showcasing its long-term reasoning and interaction capabilities.

🎮 In game tasks, UI-TARS-1.5 demonstrated stable inference scalability and validated the effectiveness of its "think-then-act" mechanism in Minecraft.

📈 The model achieves precise GUI operation through visual perception enhancement and a System2 reasoning mechanism, lowering the development threshold.

Details: https://github.com/bytedance/UI-TARS - Website: https://seed-tars.com/ - Arxiv: https://arxiv.org/abs/2501.12326

3. OpenAI Releases Practical Guide to Building Agents (Document Included)

OpenAI's recently released "A Practical Guide to Building Agents" provides product and engineering teams with the necessary knowledge and best practices for building agent systems. The guide details the definition, design, and safe deployment of agents, highlighting the fundamental differences between agents and traditional software, particularly suitable for complex decision-making and handling unstructured data.

image.png

【AiBase Summary:】

🧠 Agents possess high autonomy, capable of completing complex workflows on behalf of users, unlike the automated functions of traditional software.

🔧 Building agents requires considering core components such as models, tools, and instructions to ensure effectiveness and reliability.

🔒 Safety guardrails are crucial for managing data privacy and reputational risks. Developers need to implement multi-layered protection measures to address potential risks.

Details: https://cdn.openai.com/business-guides-and-resources/a-practical-guide-to-building-agents.pdf

4. Tencent's HunYuan InstantCharacter Open-Sourced: High Character Consistency, Customizable Poses, Styles, and Scenes

Tencent's HunYuan team has officially open-sourced the InstantCharacter framework, a character personalization tool based on diffusion transformers. It features high consistency and flexibility, capable of generating diverse character customizations from a single image and is applicable to various artistic styles. Open-sourcing this framework will lower the technical barrier for character customization, inspiring innovation among global developers. However, copyright and ethical considerations need to be addressed.

image.png

【AiBase Summary:】

🖼️ Single-image driven: Only one character image and text prompt are needed to generate diverse poses, styles, and scenes.

🔄 High consistency: Through the advanced DiT architecture, high consistency in character features is ensured.

🌈 Diverse styles: Supports realistic, anime, cartoon, and other styles to meet various creative needs.

Details: https://huggingface.co/spaces/InstantX/InstantCharacter

5. Revolutionary Video Diffusion Technology FramePack: Only 6GB VRAM, 1.5 Seconds/Frame

FramePack is a revolutionary video diffusion technology. Its low VRAM requirements and efficient generation capabilities make it a game-changer in the video generation field. Requiring only 6GB of VRAM, FramePack can generate thousands of frames of video at full frame rate, greatly reducing the barrier to entry. Furthermore, its generation speed can reach 1.5 seconds/frame after optimization, providing new possibilities for content creation and real-time applications. image.png

【AiBase Summary:】

💻 FramePack only requires 6GB of VRAM and can generate thousands of frames of video at 30fps, lowering the technical barrier.

⚡ Amazing generation speed: 2.5 seconds/frame unoptimized, and 1.5 seconds/frame optimized, suitable for various application scenarios.

🌍 This technology offers broad application prospects in content creation, game development, and edge computing, promoting the "democratization" of video generation technology.

Details: https://lllyasviel.github.io/frame_pack_gitpage/

6. Google Launches Gemini 2.5 Flash: An AI Assistant Combining Wisdom and Speed

Google's latest Gemini 2.5 Flash version has significantly upgraded its reasoning capabilities, particularly by introducing a fully mixed inference model. This allows developers to flexibly control the cost and latency during the thinking process based on their needs. By setting a thinking budget, developers can find the ideal balance between quality and efficiency. This version excels at handling complex tasks, especially in multi-step reasoning scenarios, demonstrating its superior performance and flexibility.

image.png

【AiBase Summary:】

💡 Gemini 2.5 Flash introduces a fully mixed inference model, allowing developers to choose to enable thinking functions and flexibly control the inference process.

⚙️ Developers can set a thinking budget to balance quality, cost, and latency to meet the needs of different tasks.

📊 In the LMArena "difficult prompt" test, Gemini 2.5 Flash performed excellently, second only to 2.5 Pro, demonstrating its powerful reasoning capabilities.

7. OpenAI Launches Flex Processing API for Low-Cost AI Applications

OpenAI recently launched the Flex Processing API to address intense competition in the AI market. This API allows users to use AI models at a lower cost, although there are compromises in response speed and availability. Flex processing is particularly suitable for low-priority and non-production tasks, significantly reducing costs, especially in the current context of rising AI service prices, providing a cost-effective option.

image.png

【AiBase Summary:】

💰 The Flex Processing API enables users to use AI models at a lower cost, suitable for developers with limited budgets.

⚡ With Flex processing, the input token price for the o3 model is reduced to $5 per million, and the output token price is reduced to $20 per million.

🔒 To ensure proper use, developers need to go through an authentication process to access the o3 model, maintaining platform security.

8. Midjourney Image Editor Receives Major Update: New UI, Layers, and Smart Tools

Midjourney released a significant update to its image editor on April 17, 2025, improving user experience and introducing several innovative features, including a new user interface, layer functionality, smart selection tools, and an upgraded content moderation mechanism. These improvements not only enhance editing efficiency and flexibility but also strengthen platform security, further solidifying Midjourney's leading position in the AI creative tools field.

image.png

【AiBase Summary:】

🖌️ The new user interface is optimized, improving operational efficiency and creative experience, suitable for both professional designers and novice users.

📂 The introduction of layer functionality allows users to manage images in layers, enhancing creative flexibility and precision.

🔍 The addition of smart selection tools uses AI algorithms to simplify complex editing operations, improving editing efficiency.

9. Microsoft Unveils New Language Model BitNet b1.58 2B4T, Occupying Only 0.4GB of Memory

The open-source language model BitNet b1.58 2B4T, released by the Microsoft research team, has attracted attention for its 2 billion parameters and memory footprint of only 0.4GB. This model uses an innovative 1.58-bit low-precision architecture, significantly reducing computational resource requirements and outperforming similar products. After pre-training and fine-tuning, BitNet excels in several benchmark tests and demonstrates significant advantages in energy consumption and decoding latency.

image.png

【AiBase Summary:】

🌟 The model has 2 billion parameters and occupies only 0.4GB of memory, significantly less than similar products.

🔧 It uses an innovative architecture, abandoning traditional 16-bit numbers and using 1.58-bit low-precision weight storage.

🚀 It has been released on Hugging Face, and Microsoft plans to further optimize model functionality and performance.

Details: https://arxiv.org/html/2504.12285v1

10. Genspark Super Agent Adds File Conversion Tool, Supporting Over 400 File Formats

Genspark Super Agent has launched a new file conversion tool that supports the conversion of over 400 file formats, greatly improving user work efficiency. The tool is easy to use; users simply upload the file and select the target format to quickly complete the conversion. Its intelligent optimization and seamless integration make it an indispensable assistant for personal and business users in daily office work.

image.png

【AiBase Summary:】

📁 Supports the conversion of over 400 file formats, meeting diverse office needs.

⚡ The conversion process is intelligently optimized, reducing information loss and improving file editing flexibility.

💡 Provides 200 free credits per day, lowering the barrier to entry for users using AI technology.

Details: https://page.genspark.site/page/toolu_015jDXJp3H2Whpw4V2vS71sH/genspark_file_converter_orange_n_icon.html

11. Zhipu Z Fund Invests 300 Million to Support the Global Open-Source Community; Beijing Adds 200 Million

Beijing's Artificial Intelligence Industry Investment Fund has again invested in Zhipu to support its open-source model research and development and community ecosystem building. Zhipu, a leading domestic AI large model company, has accumulated rich model capabilities in various fields and has a large developer community. This investment will further promote Zhipu's development in the open-source ecosystem, helping it achieve its goal of full open-sourcing in 2025 and promoting the popularization of artificial intelligence.

image.png

【AiBase Summary:】

💡 Beijing's Artificial Intelligence Industry Investment Fund has added 200 million yuan in investment to Zhipu to support open-source model research and development.

🌍 Zhipu plans to invest 300 million yuan to support the global AI open-source community and encourage startups based on open-source models.

📈 Since its establishment, Zhipu has open-sourced 55 models with nearly 40 million downloads, committed to promoting AI accessibility.

12. Ideal Student MindGPT 3.0 Launched: Deep Thinking Capabilities Comparable to DeepSeek

Li Auto recently announced a major upgrade to its intelligent assistant, "Ideal Student," with the MindGPT 3.0 model now fully launched. This upgrade not only improves AI performance, especially deep thinking capabilities, making it comparable to industry-leading models. Users can experience this new model for free via the mobile app and web version, enjoying a more intelligent interaction, improved speech input comprehension and error tolerance, and excellent performance in complex instruction processing.

image.png

【AiBase Summary:】

🚀 The upgrade of the MindGPT 3.0 model significantly improves deep thinking capabilities, providing users with a more intelligent and efficient experience.