Welcome to the 【AI Daily】 column! Here is your daily guide to exploring the world of artificial intelligence. Every day, we present you with hot topics in the AI field, focusing on developers to help you gain insights into technological trends and understand innovative AI product applications.

Fresh AI products Click to learn more: https://top.aibase.com/

1. OpenAI Releases o3: A Major Breakthrough in AI Reasoning Capabilities, Scoring Up to 87.5%

OpenAI has recently launched its latest o-Model reasoning series model, o3, marking a significant advancement in mathematical and scientific reasoning. o3 scored 87.5% on the ARC AGI benchmark test, demonstrating a remarkable improvement in solving complex logic and mathematical problems. The model combines neural-symbolic learning with probabilistic logic, effectively handling multi-step reasoning challenges and showing broad application potential in various fields such as education, healthcare, and software development.

image.png

【AiBase Highlights:】

🧠 o3 scored 87.5% on the ARC AGI benchmark, showcasing a significant improvement in reasoning capabilities.

🔍 In advanced mathematics tests, o3 achieved a success rate of 96.7%, with a 10% increase in scientific reasoning accuracy.

💻 o3 has broad application potential, providing practical support in education, healthcare, and software development.

2. Adobe Launches New AI Audio Tool Sketch2Sound, Create Sound Effects Just by Humming and Imitating Sounds

Sketch2Sound, an innovative AI tool developed by Adobe Research in collaboration with Northwestern University, aims to revolutionize the workflow of sound designers. Users can generate professional sound effects by humming, imitating sounds, and using simple text descriptions. The system analyzes volume, timbre, and pitch, combining them with text to create the desired sounds, making it particularly suitable for Foley artists and enhancing the efficiency of film sound production.

image.png

【AiBase Highlights:】

🎵 Sketch2Sound is a newly developed AI tool that creates sound effects through humming and text descriptions.

🔊 The system analyzes volume, timbre, and pitch, combining user voice input with text to generate target sound effects.

🎬 Especially suitable for Foley artists, it can quickly generate sound effects for films, improving work efficiency.

Details link: https://hugofloresgarcia.art/sketch2sound/

3. Baichuan Intelligence Releases Financial Large Model Baichuan4-Finance

Baichuan Intelligence has recently launched its new financial large model, Baichuan4-Finance, which achieves dual improvements in financial capabilities and general abilities through an innovative domain self-constraint training scheme, significantly enhancing its applicability in financial scenarios. According to evaluation data, Baichuan4-Finance outperformed the competitor GPT-4o in accuracy across multiple financial domains.

image.png

【AiBase Highlights:】

🚀 Baichuan4-Finance improves financial and general capabilities through a domain self-constraint training scheme.

🏆 In multiple evaluations, Baichuan4-Finance achieved an overall accuracy of 93.62%, leading GPT-4o by nearly 20%.

📊 The model's accuracy in banking, insurance, funds, and securities exceeds 95%.

Details link: https://platform.baichuan-ai.com/finPage

4. Tsinghua University and Tencent Collaborate to Launch ColorFlow: Automatically Color Black-and-White Comics While Maintaining Character Consistency

ColorFlow is a new image sequence coloring model jointly developed by Tsinghua University and Tencent's ARC Laboratory, aimed at addressing the issue of character identity consistency in coloring black-and-white images. The model employs a dual-branch design and an innovative retrieval-enhanced coloring pipeline, significantly improving coloring effects and efficiency. ColorFlow surpasses existing advanced models in multiple metrics, demonstrating higher aesthetic quality, suitable for black-and-white comics, line art, and various artistic scenarios.

image.png

【AiBase Highlights:】

🌟 ColorFlow is an innovative model for coloring black-and-white image sequences, capable of maintaining character identity consistency.

🎨 The model uses a dual-branch design for color identity extraction and actual coloring, enhancing both the effect and efficiency of coloring.

🏆 ColorFlow outperforms existing advanced models in multiple metrics, showcasing higher aesthetic quality and practicality.

Details link: https://zhuang2002.github.io/ColorFlow/

5. CAP4D: Generate High-Quality 4D Character Avatars by Uploading Reference Images

The CAP4D model is a revolutionary technology that generates high-quality 4D avatars from any number of reference images. The model employs a two-stage workflow, first generating images from different angles and expressions, then reconstructing real-time controllable 4D avatars using reference images. By utilizing advanced facial tracking technology and random sampling methods, CAP4D significantly enhances image reconstruction effects and detail presentation.

image.png

【AiBase Highlights:】

🌟 The CAP4D model generates high-quality 4D avatars from any number of reference images, using a two-stage workflow.

🖼️ This technology can generate avatars from multiple different angles, significantly improving image reconstruction effects and detail presentation.

🎤 CAP4D, in combination with voice-driven animation models, achieves audio-driven dynamic avatars, expanding the application scenarios for virtual avatars.

6. OpenAI Introduces New Memory Feature for ChatGPT: Can Recall User Interactions Across Conversations

OpenAI has recently launched a brand new memory feature that allows its AI assistant ChatGPT to recall previous interactions when users start a new conversation. This update aims to enhance user experience, allowing users to manage their memory settings comprehensively, including deleting or archiving specific information. Similarly, Google has accelerated the rollout of memory features for its chatbot Gemini, demonstrating the ongoing efforts in the AI industry towards personalized services.

image.png

【AiBase Highlights:】

🔍 OpenAI's new memory feature allows ChatGPT to recall past user interactions across conversations.

🔒 Users can manage memory settings at any time, deleting or archiving specific information.

🤖 Google has also launched similar features to enhance the personalization of AI assistants.

7. Shocking! Your AI Chat Partner Has Secretly Learned "Mind Reading"! — INFP Unlocks New Interaction Style for Two-Person Conversations

The emergence of INFP technology marks a qualitative leap in the interactive capabilities of AI virtual avatars in two-person conversations. By mimicking human expressions and actions, INFP enables virtual characters to exhibit genuine interaction in conversations, as if communicating with real people. The technological innovation behind this not only enhances user experience but also provides new possibilities for future AI dialogue systems.

image.png

【AiBase Highlights:】

🤖 INFP technology enhances the interactive capabilities of AI virtual avatars by mimicking human expressions and actions.

🎤 This technology uses audio analysis to dynamically adjust the state of AI avatars, achieving natural and smooth conversations.

📊 The DyConv dataset provides INFP with rich dialogue materials, ensuring superior learning effects and performance.

Details link: https://grisoon.github.io/INFP/

8. Luo Fuli, One of the Developers of the DeepSeek Open Source Model, Joins Xiaomi

Luo Fuli, a key developer of DeepSeek-V2, has recently announced her joining of Xiaomi to lead the AI laboratory and oversee the construction of the large model team. This move has garnered widespread attention, especially in the context of Xiaomi's increased focus on large models. Luo Fuli holds a master's degree from Peking University and has excelled in the field of natural language processing, having previously worked at Alibaba's Damo Academy, contributing to the development of multilingual pre-training models.

image.png

【AiBase Highlights:】

🌟 Luo Fuli will join Xiaomi to lead the large model team in the AI laboratory.

💰 Lei Jun expressed concerns about Xiaomi's development in the AI large model field and is recruiting talent with high salaries.

📈 The Xiaomi AI laboratory has established a dedicated team to promote the development of large model technologies.

9. AI Finally Crosses This Threshold! Livekit Open Source Model Accurately Identifies "Whether You Have Finished Speaking"!

In the fields of voice assistants and customer service robots, accurately determining whether a user has finished speaking has always been a challenge. The open-source precise speech turn detection model launched by Livekit combines Transformer models with traditional voice activity detection, significantly enhancing the naturalness and fluency of human-computer dialogue. This model reduces AI's error interruption rate, improving user experience, and is expected to make human-computer dialogue more intelligent and natural in the future.

image.png

【AiBase Highlights:】

🔍 Combining Transformer and traditional VAD technologies enhances the accuracy of speech turn detection.

💬 The new model reduces AI's error interruption rate by 85%, making human-computer dialogue more natural.

🎥 Demonstration videos show AI patiently waiting for users to finish speaking, enhancing interaction experience.

Details link: https://github.com/livekit/agents/tree/main/livekit-plugins/livekit-plugins-turn-detector

10. Fei-Fei Li's Team Conducts Pioneering Research on Multimodal AI Models, Initial Signs of Spatial Intelligence

Professor Fei-Fei Li of Stanford University and her team revealed the preliminary capabilities of multimodal large models in spatial intelligence, showcasing their potential in memory and recall of space. The research developed the VSI-Bench tool to evaluate visual spatial intelligence, and although the model's performance still lags behind humans, it has approached human levels in certain tasks.

image.png

【AiBase Highlights:】

🛠️ The research team launched the VSI-Bench tool to evaluate visual spatial intelligence, containing over 5,000 high-quality Q&A pairs.

📈 Multimodal models have approached human levels in certain tasks, with Gemini-1.5Pro performing exceptionally well in room size estimation tasks.

🌍 Fei-Fei Li's World Labs focuses on developing AI models with spatial intelligence and has received investment from several well-known institutions.

11. Trump Officially Appoints Senior AI Policy Advisor at the White House

Recently, former US President Donald Trump confirmed Sriram Krishnan as the Senior AI Policy Advisor in the White House Office of Technology Policy. Krishnan, a former partner at Andreessen Horowitz, will be responsible for coordinating the government's AI policies and collaborating with former PayPal COO David Sacks.

image.png

【AiBase Highlights:】

🌟 Sriram Krishnan has been appointed as Trump's Senior AI Policy Advisor, responsible for coordinating government AI policies.

🤝 He will collaborate with former PayPal COO David Sacks to promote AI and cryptocurrency-related policies.

💼 Krishnan has held leadership positions at several well-known tech companies and shared insights on AI trends in The New York Times.

12. Shanjijia AI Snap Mirror Announces Pre-Sale Sold Out: 99,999 Yuan, 50,000 Units Gone in One Day

Shanjijia Technology recently launched its first AI Snap Mirror, marking significant progress in the domestic AI photography glasses field. The product was priced at 1499 Yuan, and the first batch of 50,000 units sold out quickly at a promotional price of 999 Yuan, demonstrating strong market demand. Additionally, Shanjijia has launched an attractive promotional campaign where users can receive a full refund by checking in for 200 days within 300 days.

image.png

【AiBase Highlights:】

📸 This AI Snap Mirror is priced at 1499 Yuan, with the first batch of 50,000 units selling out at a promotional price of 999 Yuan, indicating strong market demand.

🎉 Users can receive a full refund by checking in for 200 days within 300 days, increasing the product's appeal.

🔍 The glasses are equipped with a Sony 16-megapixel camera, supporting various smart features, providing a rich user experience.