AI Daily: Kimi's New Audio Foundation Model Kimi-Audio; Step1X-Edit, an Open-Source Image Editing Model; Quark AI Super Box Launches

Welcome to the 【AI Daily】column! This is your daily guide to exploring the world of artificial intelligence. Every day, we present you with the hottest AI news, focusing on developers and helping you understand technological trends and innovative AI product applications.

New AI Products Learn More: https://top.aibase.com/

1. Moonshot AI Releases Kimi-Audio: A New Benchmark for Open-Source Audio Foundation Models

Moonshot AI recently launched Kimi-Audio, an open-source audio foundation model aimed at advancing audio understanding, generation, and interaction technologies. Based on the Qwen2.5-7B architecture and incorporating Whisper technology, the model supports various audio tasks such as speech recognition and audio question answering. Trained on over 1.3 billion hours of diverse audio data, Kimi-Audio excels in multiple benchmark tests, surpassing existing models.

【AiBase Summary:】
🎤 Kimi-Audio boasts powerful multi-functional audio processing capabilities, supporting tasks like speech recognition and audio question answering.
📊 Trained on over 1.3 billion hours of diverse audio data, the model demonstrates exceptional performance.
🌍 Kimi-Audio's open-source strategy lowers the barrier to entry for audio AI technology, promoting the democratization of AI globally.
Details: https://github.com/MoonshotAI/Kimi-Audio

2. Step1X-Edit: A New Benchmark for Open-Source Image Editing

Step1X-Edit, an open-source image editing model launched by the Stepfun AI team, combines multimodal large language models with diffusion transformers, showcasing powerful image generation capabilities. Its open-source nature and high performance have garnered significant industry attention, particularly its excellent performance in the GEdit-Bench benchmark test. This model provides a powerful tool for content creators and developers, driving the advancement of image editing technology.

【AiBase Summary:】
{'emoji': '🚀', 'content': 'Step1X-Edit combines multimodal large language models and diffusion transformers for efficient high-quality image generation.'}
{'emoji': '📊', 'content': 'GEdit-Bench benchmark tests show its performance surpasses existing open-source models and approaches the level of closed-source models.'}
{'emoji': '💡', 'content': 'Its open-source nature provides a foundation for research and development, promoting innovation and widespread adoption of image editing technology.'}
Details: https://huggingface.co/spaces/stepfun-ai/Step1X-Edit

3. Quark AI Super Box Upgrade Launches "Photo Ask Quark" Feature: Answers Everything

On April 25th, Alibaba's Quark AI Super Box launched the "Photo Ask Quark" feature. This innovation utilizes visual understanding and reasoning models to quickly identify and understand various problems encountered by users in real life. Users can obtain accurate information and answers by taking photos, covering multiple fields including artifact explanations, product identification, and health analysis.

【AiBase Summary:】
📸 The new "Photo Ask Quark" feature, based on visual understanding, quickly identifies content in images and provides relevant information.
🛒 Users can upload product images to directly jump to Taobao links for the same product, enhancing the shopping experience.
🌍 The feature supports multiple languages for questions and translation, suitable for travel, health, work, and other scenarios.

4. Apple's AI Smart Features Coming to China? iOS 18.5 Official Version Expected in May

Apple is expected to release the iOS 18.5 update to Chinese users in May, bringing the much-anticipated Apple Smart features. This feature has already launched in other regions, and Chinese users have been waiting for nearly a month. Apple Smart is a personal-scenario-based AI system offering diverse services, including photo removal and smart replies. However, only iPhone 15 Pro series and the upcoming iPhone 16 series will support this feature, and users need to ensure sufficient storage space on their devices.

【AiBase Summary:】
🆕 Apple Smart features will officially launch for Chinese users in May, marking Apple's entry into the generative AI era.
📸 Features include photo removal, notification summaries, and smart replies, but are only supported by iPhone 15 Pro and above.
💾 Users need at least 7GB of available storage space, which may pose a challenge for some users' storage management.

5. Google AI Releases 601 Real-World Generative AI Application Cases, Covering Various Industries

Google Cloud recently released a report showcasing 601 generative AI application cases from leading global companies, demonstrating the rapid development and widespread adoption of this technology. This represents a sixfold increase compared to just 101 cases last year, spanning multiple industries including automotive, finance, and healthcare. These cases not only highlight the importance of generative AI in operations and strategy but also showcase its potential as an integral part of organizational structures.

【AiBase Summary:】
🔍 601 generative AI application cases demonstrate the technology's widespread use across various industries, a sixfold increase from last year.
💼 Clear AI agent categorization showcases AI's multiple roles in customer service, internal productivity, and security.
🚀 Real-world application cases across industries highlight the significant trend of generative AI moving from experimentation to production.
Details: https://cloud.google.com/transform/101-real-world-generative-ai-use-cases-from-industry-leaders

6. Microsoft Releases New Agent Operating System UFO², Deeply Integrating Windows with Intelligent Automation

Microsoft's recently released UFO² version brings significant advancements in automation, particularly in its deep integration with the Windows system. The new version can directly call Windows' native APIs, greatly improving the efficiency of automated tasks. Compared to OpenAI's Operator, UFO² shows significantly higher success rates in multiple test scenarios, especially when handling complex tasks and cross-application operations.

【AiBase Summary:】
🚀 UFO² deeply integrates with the Windows system, directly calling native APIs to improve automation efficiency.
📊 UFO²'s automation task success rate is significantly higher than OpenAI's Operator, demonstrating excellent performance.
🖥️ The new picture-in-picture mode isolates automated tasks from user operations, enhancing user experience.
Details: https://github.com/microsoft/UFO?tab=readme-ov-file

7. OpenAI Releases New ChatGPT Version: Smarter and More Intuitive GPT-4o

OpenAI recently made significant updates to its GPT-4o version of ChatGPT, focusing on improved memory retention and enhanced skills in science, technology, engineering, and mathematics (STEM) fields. The new version aims to guide conversations more effectively towards productive outcomes while improving the model's intelligence and personality traits. While acknowledging some "smoothing" issues, OpenAI promises future improvements. Developers can also opt for the new GPT-4.1 series for a more stable API experience.

【AiBase Summary:】
🌟 The updated GPT-4o version has been optimized for memory retention and STEM skills.
🤖 OpenAI acknowledges "smoothing" issues in certain situations and will make improvements in the future.
🔧 Developers can choose the newly launched GPT-4.1 series for a more stable API experience.

8. Ema Launches New Language Model EmaFusion: Beats O3, Gemini in Cost and Accuracy

Ema has launched a new language model, EmaFusion, claiming to surpass several well-known AI models in both cost and accuracy. EmaFusion employs a "cascade" judgment system that dynamically balances cost and accuracy, allowing users to fine-tune it based on task requirements. Its accuracy reaches 94.3%, with significantly reduced running costs, making it a new choice for enterprise AI development.

【AiBase Summary:】
🌟 EmaFusion boasts 94.3% accuracy at a cost that is one-quarter of the market average.
💡 EmaFusion intelligently decomposes complex tasks and assigns them to the most suitable AI model.
🚀 Ema is collaborating with global leaders like KPMG and Hitachi to drive the development of enterprise AI.
Details: https://www.ema.co/emafusion

9. Liquid AI Launches Hyena Edge, Ushering in a New Era for Smart Phone Edge Devices

Liquid AI recently launched Hyena Edge, a new convolutional model designed to provide more efficient AI solutions for smartphones and edge devices. This model surpasses traditional Transformer++ models in computational efficiency and memory usage, making it particularly suitable for resource-constrained environments. Hyena Edge performs exceptionally well in several standard language model benchmark tests, showcasing the potential of automated architecture design, and is planned to be open-sourced in the future to promote technology adoption.

【AiBase Summary:】
🌟 Hyena Edge is Liquid AI's new convolutional model, specifically designed for edge devices like smartphones.
🚀 The model outperforms traditional Transformer++ models in computational efficiency and memory usage, suitable for resource-constrained environments.
📈 Hyena Edge excels in several standard language model benchmark tests and is planned for future open-sourcing to promote technology adoption.
Details: https://www.liquid.ai/research/convolutional-multi-hybrids-for-edge-devices

10. LemonAI Launches Real-Time Audio-Video AI Digital Human Model Slice Live

LemonAI recently launched its innovative product, Slice Live, a world-first real-time audio-video AI model. Users only need to upload a photo to conduct real-time video calls with virtual characters. Slice Live uses an advanced Transformer model to render each pixel at 25 frames per second, ensuring smooth and realistic visuals. This product shows great potential in entertainment and education, and will be expanded to AR, VR, and metaverse applications in the future, while prioritizing user privacy and data security.

【AiBase Summary:】
📸 Users can conduct real-time video calls with virtual characters by simply uploading a photo.
🎭 Slice Live provides immersive interactive experiences in entertainment and education, offering vivid learning content.
🔒 LemonAI is committed to continuous exploration of privacy protection to ensure user data security.

11. Zhipu and Shengshu Technology Reach Strategic Partnership, Focusing on Joint Innovation in Large Models

On April 27th, Zhipu, under Tsinghua University, and Shengshu Technology announced a major strategic partnership aimed at jointly promoting technological innovation and industrial application of domestic large models through their technological accumulation in large language models and multimodal generative models. This cooperation covers joint R&D, product linkage, and solution integration, focusing on multiple industries to promote the application and development of AI technology, showcasing the huge potential of domestic large models in technological innovation and industrial applications.

【AiBase Summary:】
🤖 Zhipu and Shengshu Technology will jointly develop large language models and multimodal generative models to drive technological innovation.
📈 The partnership will integrate their respective technological strengths to create more competitive industry solutions.
🌐 The collaboration will focus on government and enterprise services, cultural tourism, and other fields to jointly promote the large-scale application of AI technology.

12. BMW China Announces New Models Integrating DeepSeek, Including the 5 Series and the All-New X3

BMW China will launch new models equipped with DeepSeek technology in the third quarter, marking significant progress in its intelligent in-vehicle systems. This technology will be applied to multiple new vehicles with the ninth-generation operating system, enhancing the interaction experience between drivers and vehicles. Users can communicate using natural language through the BMW Intelligent Personal Assistant, and the system can understand and respond to colloquial commands, providing a convenient driving experience.

【AiBase Summary:】
🚗 BMW will launch new models equipped with DeepSeek technology in the third quarter, enhancing the intelligent in-vehicle interaction experience.
🗣️ Users can communicate using natural language through the BMW Intelligent Personal Assistant, and the system understands colloquial commands.
🌟 DeepSeek technology aims to enhance user-vehicle interaction, providing a more convenient driving experience.

AI News

AI Daily

AI Timeline

Al Hardware

Latest Cases

Image Collection

Video Collection

Audio Collection

Content Collection

Latest Tutorials

AI Product Ranking

AI Traffic Growth Ranking

AI Traffic Decline Ranking

AI Weekly Ranking

United States

China

India

Brazil

Image Generation

Personal Assistant

Character Generation

Video Generation

AI Project Ranking

AI Project Growth Ranking

AI Developer Ranking

AI Organization Ranking

Deepseek

TTS

LLM

ChatGPT

Overview

AI Daily: Kimi's New Audio Foundation Model Kimi-Audio; Step1X-Edit, an Open-Source Image Editing Model; Quark AI Super Box Launches - Take a Photo and Ask Quark

站长之家

This article is from AIbase Daily

AI News Recommendations

Moonshot AI Unveils Kimi-Audio: A New Benchmark for Open-Source Audio Foundation Models

AI Daily: Vidu Q1 Officially Launched; MCP SDK Now Supports Streaming HTTP; Douyin Bans 2.6 Million AI-Related Accounts in Q1

Kimi Open Platform Announces Price Adjustments: Model Inference and Context Cache Prices Reduced

AI Daily: Kuaishou's Keling AI Fully Integrates with DeepSeek-R1; Baidu Releases Ernie 4.5 and X1 Large Models; Xiaomi's Large Model Team Tops Audio Inference MMAU Leaderboard

AI Daily: DeepSeek R2 Potentially Launching March 17th; Tencent Releases Hunyuan-TurboS; Pika Adds Video Exchange Functionality

AI Daily: DeepSeek Launches NSA Technology; Xiaohongshu Targets Black and Gray Markets with AIGC Large Model for Account Maintenance; Kimi from The Dark Side of the Moon Suspends Large-scale Advertising

AI Daily: The First AI Short Film Video Generation Model SkyReels-V1 Released as Open Source; Musk Unveils Powerful New Model Grok 3; AI Talent Luo Fuli Starts New Job

AI Daily: Major News! Baidu and WeChat Integrate DeepSeek; ByteDance's AI Programming Tool Trae Launches Windows Version; Musk's xAI to Release Grok 3

AI Daily: Liang Wenfeng from DeepSeek may become Asia's top tech billionaire; ByteDance releases new video model Goku; Musk wants to acquire OpenAI for nearly $100 billion

AI Daily: Blogger Earns 200,000 in 4 Days Selling Deepseek Paid Course; Pika’s New Feature Allows One-Click Insertion of People into Videos; Google Launches Imagen 3 API