Welcome to the AI Daily column! Your daily guide to exploring the world of artificial intelligence.We bring you the hottest AI news every day, focusing on developers and helping you understand technology trends and innovative AI applications.
Discover the latest AI products Learn More: https://top.aibase.com/
1. Alibaba's Qwen2.5-Omni Tops the Global Open-Source Model Leaderboard
Hugging Face released its latest large model leaderboard. Alibaba's Qwen2.5-Omni, with its exceptional performance and multimodal capabilities, successfully reached the top, becoming the leader in global open-source models. This achievement not only showcases Alibaba's strength in technology R&D but also creates conditions for the popularization and application of AI technology.
【AiBase Summary:】
🏆 Qwen2.5-Omni tops the global open-source model leaderboard, demonstrating powerful performance and multimodal capabilities.
🔍 DeepSeek-V3-0324 and SpatialLM-Llama-1B follow closely behind, offering developers more choices.
🌐 Alibaba has open-sourced 200 models, promoting the popularization and application of AI technology.
2. MiniMax Audio Launches Speech-02 Voice Model with 200,000 Character Input Capacity
MiniMax Audio recently launched its new Speech-02 series of voice models, supporting over 30 languages and allowing for a massive 200,000-character input at once. The new model boasts 99% human-like voice similarity in speech synthesis and solves rhythm glitches in audio playback, ensuring a smooth listening experience.Furthermore, the new "Read Anything" function and "Long-Text Mode" make it easier for users to access and process long-form text content, significantly improving user experience.
【AiBase Summary:】
🎤 Speech-02 series supports over 30 languages, with 99% voice similarity, providing a natural and smooth audio experience.
📄 The new "Read Anything" function allows users to upload files or paste URLs to listen to various content at any time.
📝 "Long-Text Mode" supports a single input of 200,000 characters, making it easy to handle long texts, ideal for audiobooks and podcast production.
Details: https://www.minimax.io/audio
3. ChatGPT's Paid Users Surge to 20 Million, Annual Revenue Up 30%
OpenAI's ChatGPT has seen its paid user base explode to over 20 million in just three months, with annual recurring revenue increasing by nearly 30%, demonstrating strong user demand for this AI tool. Although the percentage of paid users has slightly decreased, weekly active users have reached 500 million. To support the growing user base, OpenAI plans to raise $40 billion in funding, even though the company is still losing money and is projected to be five years away from profitability.
【AiBase Summary:】
🌟 ChatGPT's paid users have surpassed 20 million, with annual recurring revenue up 30%.
💰 OpenAI plans to raise $40 billion in funding, still on the path to profitability.
🚀 Competitors Gemini, Claude, and Grok are growing rapidly, intensifying market competition.
4. ElevenLabs Releases "Text To Bark," the World's First Canine AI Text-to-Speech Model
ElevenLabs has launched "Text To Bark," the world's first AI text-to-speech model designed specifically for dogs. This technology can convert human-input text into highly realistic dog barks, claiming that 95% of dogs can't distinguish the source of the sound. This innovation offers new possibilities for communication between humans and pets, although dogs may still not understand the specific intent.
【AiBase Summary:】
🐕🦺 "Text To Bark" converts text into dog barks, claiming 95% of dogs can't distinguish their authenticity.
🎤 Users can select breeds and adjust the tone and rhythm of the barks to suit different scenarios.
🌐 ElevenLabs plans to expand this technology to other animals, exploring multimodal interaction systems.
Details: https://top.aibase.com/tool/text-to-bark
5. Tired of Dealing with Multiple Images? Tencent Yuanbao Update: Upload Multiple Images and Process Them Intelligently with One Click
Tencent Yuanbao recently underwent a significant functional upgrade, particularly enhancing its image recognition capabilities. Users can now upload up to 10 images at once, achieving seamless image recognition and understanding using either the HunYuan or DeepSeek models. This functionality proves highly practical in real-world applications, helping users quickly extract information, generate copy, and even transform sketches into web demos.
【AiBase Summary:】
📸 Supports uploading 10 images at once, improving image recognition efficiency.
📝 Combined with HunYuan's multimodal understanding capabilities, it provides seamless content analysis and copy generation.
💻 Comprehensive support across multiple platforms, including mobile, desktop, and web versions, for convenient operation.
6. EasyControl_Ghibli Model Launched: Free Access to Ghibli-Style Image Generation
The launch of the EasyControl_Ghibli model provides users with a free tool to easily generate images in the style of Studio Ghibli. It breaks the limitations of traditional AI image generation, allowing ordinary users to participate in artistic creation and experience the fun and warmth brought by technology. While the model still has room for improvement, its open-source nature and ease of use open up new possibilities for education, entertainment, and personal expression, showcasing the potential and charm of AI technology.
【AiBase Summary:】
🌟 The EasyControl_Ghibli model is available on the Hugging Face platform, allowing users to generate Ghibli-style images for free.
🖼️ This model is trained on 100 photos of real Asian faces, capturing the light and shadow and emotion of Ghibli works.
🚀 The model's open-source nature and ease of use enable ordinary users to easily participate in artistic creation, bridging the gap between people.
Details: https://top.aibase.com/tool/easycontrol-ghibli
7. PaddlePaddle 3.0 Officially Released, Supports Large Models Like Wenxin 4.5, Reducing Cross-Chip Adaptation Costs by 80%
Baidu's deep learning platform, PaddlePaddle, recently launched its next-generation framework 3.0, marking a significant technological innovation in the deep learning field. By introducing five core technological innovations, such as dynamic and static unified automatic parallelism, the framework significantly reduces the development and training costs of large models and improves performance and adaptability. PaddlePaddle 3.0 supports multiple mainstream large models and achieves seamless migration across chips, reducing hardware adaptation costs by 80%.
【AiBase Summary:】
⚙️ PaddlePaddle framework 3.0 introduces five core technological innovations to reduce the development and training costs of large models.
📈 Through optimized DeepSeek-R1 single-machine deployment, throughput is increased by up to double.
💻 Supports over 60 mainstream chips, enabling seamless cross-chip migration, reducing adaptation costs by 80%.
8. Krea Integrates Gemini's Text-to-Image and Image Editing Capabilities: Chat Interface Experiences a Leap in Usability
Krea's recent deep integration with Google Gemini successfully introduced text-to-image generation and image editing capabilities, greatly enhancing the platform's generative capabilities and user experience. This update transforms the Krea Chat interface from a simple conversation tool into a comprehensive creative platform capable of quickly generating and editing visual content, lowering the barrier to creation.
【AiBase Summary:】
🖼️ Krea integrates with Google Gemini, launching text-to-image and image editing features, enhancing user experience.
💡 Users can quickly generate and edit images through natural language descriptions, lowering the barrier to creation.
🚀 This update is expected to shorten the cycle from concept to finished product in the creative industry, boosting team creativity.
9. Tencent Releases GeometryCrafter: Using AI to Unlock the Geometric Consistency of Open-World Videos
Tencent's recently launched GeometryCrafter model has made a significant breakthrough in geometric estimation of open-world videos. Using diffusion priors, it successfully achieves deep understanding and processing of dynamic video content. The model can extract and generate consistent geometric information without additional information, filling a gap in this field.
【AiBase Summary:】
🌐 GeometryCrafter uses diffusion priors to achieve consistent geometric estimation of open-world videos, improving the deep understanding of video content.
🔍 The model can generate fine and coherent depth sequences and geometric structures without camera pose or optical flow data, filling an industry gap.
💡 Tencent has chosen to open-source the model code on Hugging Face, promoting the popularization of AI technology and allowing more creators to participate in technological exploration.
Details: https://huggingface.co/papers/2504.01016
10. Meta Launches AI System MoCha: Text Instantly Transforms into Vivid Animated Characters with Natural Lip Sync and Movement
The MoCha AI system, jointly developed by Meta and a research team at the University of Waterloo, generates full-body animated characters from text descriptions, featuring synchronized speech and natural movements. This technology marks a significant improvement in content creation efficiency and expressiveness, showing great application potential in areas such as digital assistants and virtual avatars.
【AiBase Summary:】
🎭 MoCha generates full-body animated characters from text, with natural movements and synchronized speech.
🗣️ Through an innovative "speech-video window attention" mechanism, MoCha achieves more accurate lip synchronization, solving challenges in audio and video generation.
👥 The multi-character management system is simple and efficient. Users only need to define character information once to use it in different scenes, improving creation convenience.
Details: https://top.aibase.com/tool/mocha
11. GPT-4.5 Passes the Turing Test for the First Time Using "Persona Play": AI Conversational Ability Reaches New Heights
Research from the University of California, San Diego, shows that OpenAI's GPT-4.5 surpassed human performance in the Turing test for the first time using "persona play," becoming the AI system with the most human-like conversational ability. The model demonstrated excellent performance in language fluency and emotional expression, flexibly responding to the judge's emotional changes, showcasing human-like social intelligence. This breakthrough not only drives the development of AI technology but also sparks profound discussions on AI intelligence standards.
【AiBase Summary:】
🤖 GPT-4.5 surpasses human performance in the standard Turing test with a 73% pass rate, becoming the first AI model to truly "pass."
💬 The model demonstrates amazing language fluency and emotional richness, adapting its responses flexibly based on the judge's tone.
🧠 GPT-4.5's success stems from its complex persona play mechanism and conversation strategies, driving the application potential of AI technology.
Details: https://arxiv.org/pdf/2503.23674
12. OpenAI Quietly Launches OpenAI Academy, Offering Free AI Educational Resources
OpenAI recently launched a new educational platform, the OpenAI Academy, aiming to provide free and high-quality AI learning resources to global users. The platform covers various courses from basic knowledge to advanced skills, suitable for self-learners, educators, and developers. Although not widely publicized, this initiative is considered a significant step by OpenAI in promoting the popularization of AI education and has been widely welcomed by industry professionals.
【AiBase Summary:】
📚 OpenAI Academy provides tens of hours of free learning materials covering the basics and advanced skills of artificial intelligence.
💻 The platform is open to self-learners, educators, and developers, with flexible and diverse course formats, including online and offline activities.
🌍 The launch of OpenAI Academy marks the company's active role in education and knowledge dissemination, aiming to lower the barrier to AI learning.
Details: https://academy.openai.com/?continueFlag=bc9fbeae4c35e24ba47bde4cf390e735