AI Daily: Alibaba Open-Sources Video Generation Model Wanxiang 2.1; Hualu Quantitative Responds to Early Release of DeepSeek-R2 Model; Baidu's "Miaoda" Begins User Invitation Testing

Welcome to the 【AI Daily】column! This is your daily guide to exploring the world of artificial intelligence. Every day, we present you with the hottest AI news, focusing on developers and helping you understand technology trends and innovative AI product applications.

Check out the latest AI products Learn More: https://top.aibase.com/

1. Tongyi Wanxiang's Open-Source Video Generation Model Wan2.1: Generate 480P Videos with Only 8.2G of VRAM

Tongyi Wanxiang's newly released Wan2.1 model focuses on high-quality video generation. With its superior performance and innovative technology, it's become a preferred tool for creators and enterprise users. The model achieved a high score of 86.22% in the Vbench evaluation, surpassing other video generation models and demonstrating significant performance advantages. Wan2.1 optimizes video generation and inference efficiency through an efficient 3D causal VAE module and Diffusion Transformer architecture, providing users with flexible development and deployment options.

【AiBase Summary:】
🚀 Wan2.1 model ranked first in the Vbench evaluation with a score of 86.22%, outperforming other video generation models.
💡 Uses a 3D causal VAE module, achieving 256x lossless video latent space compression, improving video reconstruction speed.
🔧 Supports various mainstream frameworks. Developers can quickly experience it through Gradio, simplifying the inference and deployment process.
Details: https://github.com/Wan-Video

2. 360 Zhi Nao Releases Tiny-R1-32B: Approaching the Full Performance of Deepseek-R1 with 5% of the Parameters

The Tiny-R1-32B-Preview model, jointly developed by the 360 Zhi Nao team and Peking University, successfully approaches the performance of Deepseek-R1 with only 5% of its parameters, demonstrating the potential of smaller models in efficient inference. This model excels in mathematics, programming, and science evaluations, particularly achieving a score of 78.1 in the AIME2024 evaluation, showcasing its balanced optimization capabilities across multiple tasks. The development team has pledged to open-source the complete model resources, promoting the inclusive development of technology.

【AiBase Summary:】
📊 Tiny-R1-32B-Preview model achieves near Deepseek-R1 performance with 5% of the parameters, showcasing the efficient inference potential of small models.
💻 Excels in mathematics, programming, and science evaluations, surpassing the current best open-source 70B model.
🔗 The development team has committed to open-sourcing the complete model repository to promote technological accessibility and has uploaded the model to the Hugging Face platform.
Details: https://huggingface.co/qihoo360/TinyR1-32B-Preview

3. DeepSeek Open Source Week Day 3: Releases DeepGEMM, an FP8 GEMM Library to Power AI Training and Inference

On the third day of its Open Source Week, Chinese AI company DeepSeek launched DeepGEMM, an open-source library supporting FP8 general matrix multiplication, designed to support dense and Mixture-of-Experts models. This library achieves over 1350 TFLOPS of FP8 computing performance on NVIDIA Hopper GPUs, with its core code consisting of only 300 lines, showcasing exceptional efficiency and simplicity. The release of DeepGEMM marks DeepSeek's further efforts in promoting the transparency of AI technology and community collaboration, promising significant improvements to AI training and inference in the future.

【AiBase Summary:】
🚀 DeepGEMM is an open-source library designed for dense and Mixture-of-Experts matrix operations, supporting FP8 general matrix multiplication.
💻 Achieves up to 1350+ TFLOPS of FP8 computing performance on NVIDIA Hopper GPUs, demonstrating exceptional efficiency.
🌐 The release of this library not only enhances the performance of DeepSeek models but also provides a highly efficient and user-friendly matrix operation tool for global developers.
Details: https://github.com/deepseek-ai/DeepGEMM

4. Baidu's No-Code Development Tool "Miaoda" Begins User Invitation Testing

On February 25th, Baidu officially announced that its no-code development tool, "Miaoda," has begun user invitation testing. Users can access the Miaoda homepage via the invitation email they received to experience H5 page development and website development features. Released on November 12, 2024, at Baidu World 2024, this tool features no-code programming, multi-agent collaboration, and multi-tool invocation. The number of enterprise users applying for testing has exceeded 20,000. Baidu will subsequently open more features, and users can apply to join the testing queue on the Baidu Smart Cloud website.

【AiBase Summary:】
🚀 Users can access Miaoda via invitation email to experience various development features.
📈 The number of enterprise users applying for testing has exceeded 20,000, indicating strong market demand.
🔧 Miaoda features no-code programming and multi-agent collaboration, enhancing development efficiency.
Details: https://digital.cloud.baidu.com/mF/commonLandingPage/CTA/889605a4883041b98b16538350ea33f8?pushId=bBDCrkwdYZ6bP8TE44JbCM1

5. Google Launches Ultra-Low-Cost AI Model Gemini 2.0 Flash-Lite

Google recently launched Gemini 2.0 Flash-Lite, the most cost-effective option in its AI model series, aimed at providing a high-value solution for developers with limited budgets. This model excels at handling large-scale text output tasks and features a highly competitive pricing strategy, with input and output token costs significantly lower than market competitors. While it doesn't support advanced features, its efficiency and practicality in text generation make it ideal for startups and small teams.

【AiBase Summary:】
💰 Gemini 2.0 Flash-Lite's input tokens are priced at $0.075 per million, and output tokens at $0.30 per million, offering exceptional value.
📈 Outperforms Gemini 1.5 Flash in performance, handling a 1 million token context window, suitable for high-frequency tasks.
📝 While not supporting image or audio output, Gemini 2.0 Flash-Lite focuses on text generation, capable of generating single-line captions for approximately 40,000 photos for under $1.

6. Huafang Quantization Responds to Rumors of DeepSeek-R2 Model's Early Release: Official Announcements Will Prevail

Recently, Huafang Quantization responded to rumors of the early release of DeepSeek's next-generation AI model, R2, emphasizing that official announcements will be the definitive source of information. Huafang Quantization established DeepSeek AI in July 2023 and released the DeepSeek-R1 model in January of this year. Reuters reported that DeepSeek is accelerating the release of the R2 model, planning to bring it out as early as May, with the new model expected to improve code generation and multilingual reasoning capabilities.

【AiBase Summary:】
🔍 Huafang Quantization states that official announcements will prevail, responding to rumors of the early release of the DeepSeek-R2 model.
🚀 DeepSeek established DeepSeek AI in July 2023 and successfully released the DeepSeek-R1 model in January.
🌐 The next-generation DeepSeek-R2 model is expected to improve code generation and multilingual reasoning capabilities.

7. Microsoft Open-Sources New Multimodal AI Agent "Magma": Enables Automated Ordering and Behavior Prediction

Microsoft recently open-sourced a multimodal AI agent base model called "Magma" on its official website. Magma can traverse the digital and physical worlds, processing various data types such as images, videos, and text, and possesses psychological prediction capabilities, enabling a more accurate understanding of the intentions of people or objects. This AI has a wide range of applications, not only assisting users with daily operations such as automated ordering and weather queries but also controlling physical robots and providing real-time assistance. The launch of Magma marks significant progress in intelligent assistant and robotics technology, particularly suitable for AI-powered assistants or robots, enhancing their learning ability and practicality.

【AiBase Summary:】
🌐 Cross-modal capabilities: Magma can process various data types such as images, videos, and text, enhancing the functionality of intelligent assistants.
🤖 Intelligent applications: Users can use Magma for automated ordering, weather queries, and controlling physical robots.
📚 Learning adaptability: Magma helps robots learn new tasks and generates operational guides for virtual assistants, enhancing their practicality.
Details: https://microsoft.github.io/Magma/

8. Escalating Competition with DeepSeek and Claude! OpenAI's Deep Research Feature Opens to All Paid ChatGPT Users

OpenAI recently expanded its Deep Research feature, making it available to all ChatGPT Plus, Team, Education, and Enterprise users. This feature is considered the most transformative AI assistant since ChatGPT, capable of conducting complex research tasks and generating professional reports. Meanwhile, China's DeepSeek is challenging OpenAI's business model by open-sourcing new models, intensifying market competition. This technology excels at improving efficiency but also faces challenges in collaborating with human experts, requiring businesses to re-evaluate information workflows to leverage this technology more effectively.

【AiBase Summary:】
💻 OpenAI expands the Deep Research feature to multiple user tiers, enhancing the research capabilities of its AI assistant.
🔍 China's DeepSeek challenges OpenAI's subscription business model by open-sourcing new models.
📈 Deep research creates new business opportunities between efficiency and limitations, prompting businesses to reshape information processing workflows.

9. PhotoDoodle AI Transforms Your Photos into Whimsical Artworks with Just a Few Prompts

Jointly developed by ByteDance and research teams from universities in China and Singapore, PhotoDoodle utilizes the Flux.1 model to redefine image creation. This system learns artistic styles from a small number of samples and precisely executes editing instructions, significantly enhancing the possibilities of creative expression. Core technologies include position encoding cloning, ensuring that new elements naturally blend into the original image, while the research team is exploring more efficient single-image training methods.

【AiBase Summary:】
🖌️ PhotoDoodle is based on the Flux.1 model, capable of learning artistic styles from a small number of samples and executing editing instructions.
✨ Position encoding cloning technology allows the AI to remember each pixel's position, ensuring new elements blend naturally into the background.
📊 The research team has released a dataset containing six artistic styles and is exploring more efficient single-image training methods.
Details: https://github.com/showlab/PhotoDoodle

10. OpenAI Makes Advanced Voice Chat Mode for ChatGPT Free

On February 26th, OpenAI announced on the X platform that the advanced voice mode for ChatGPT is now officially free for users. This mode is based on the GPT-4o mini model and, through optimized computational efficiency, achieves performance close to the full version of GPT-4o. Currently, ChatGPT desktop applications for macOS and Windows 10/11 systems support this mode, allowing users to choose from 5 voices and enjoy custom prompts and conversation review features. This move will enhance users' voice interaction experience and promote the widespread application of artificial intelligence technology.

【AiBase Summary:】
🎤 Advanced voice mode is based on the GPT-4o mini model, with performance close to the full GPT-4o.
💻 Currently supports ChatGPT desktop applications for macOS and Windows 10/11 systems.
🚀 Offers 5 voice options, supporting custom prompts and conversation review features.

11. Adorable and Viral! AI "Magic" Transforms Campus Landmarks into Plush Toys, Creative Effects Go Viral!

Recently, an AI effect called "Come and Fluff Me Up" has become a sensation on social media, transforming real-world buildings into cute plush toy styles. The effect's simple operation and stunning results have attracted numerous users, particularly popular in universities and tourism institutions. Although the generated effects have some randomness, users can select different styles of plush effect images to achieve more satisfying results. The popularity of this AI effect showcases the immense potential of AI technology in the creative field.

【AiBase Summary:】
🎉 The AI effect "Come and Fluff Me Up" transforms real-world buildings into plush toy styles, with adorable and realistic results.
📈 The effect has quickly gone viral on social media platforms, attracting numerous users to participate and share their generated videos.
🛠️ Users can select different styles of generated images to ensure the final effect better suits their preferences.

12. Supports Online Search! OPPO ColorOS Integrates Full-Powered DeepSeek-R1

AI Daily News

AI Daily: Alibaba Open-Sources Video Generation Model Wanxiang 2.1; Hualu Quantitative Responds to Early Release of DeepSeek-R2 Model; Baidu's "Miaoda" Begins User Invitation Testing

站长之家

This article is from AIbase Daily

AI News Recommendations

OpenAI Announces Sora, its Video Generation Model, Officially Launches in European Market

Pika 2.2 Officially Released: AI Video Generation Technology Receives a Major Upgrade

Kuaishou Keling AI Sees 113% MoM Increase in Global Users in January

Alibaba's Open-Source Video Generation Model Wan 2.1 Tops Benchmarks, Runs Smoothly on 4070