Search AI Products and News

Explore worldwide AI information, discover new AI opportunities

✓AI News
AI Tools

Type :

✓AI News
AI Tools

2025-07-09 09:28:37.AIbase

AliTongyi Opensources Audio Generation Model ThinkSound Supporting Chain-of-Thought Reasoning

Recently, the Ali Speech AI team announced the open source of ThinkSound, the world's first audio generation model supporting chain-of-thought reasoning. By introducing the chain-of-thought technology, this model breaks through the limitations of traditional video-to-audio technology in capturing dynamic visuals, achieving high-fidelity and strong synchronized spatial audio generation. This breakthrough marks a leap forward in AI audio technology, moving from 'image配音' to structured understanding of visual content.

2025-07-07 09:15:37.AIbase

Gemini CLI Major Update! Audio and Video Processing + New Privacy Features - A Blessing for Developers!

Google's Gemini CLI adds audio/video processing, Markdown support & privacy features. With 85 improvements by 51 contributors, it now supports VSCodium/Neovim, upgraded to Ink6/React19, offering free million-token access under Apache2.0 license.....

2025-07-04 11:09:01.AIbase

DeepMind introduces Crome: Enhancing the Alignment of Large Language Models with Human Feedback

In the field of artificial intelligence, reward models are a critical component for aligning large language models (LLMs) with human feedback, but existing models face the issue of "reward hacking." These models often focus on superficial features, such as the length or format of responses, rather than identifying genuine quality metrics, such as factual accuracy and relevance. The root cause lies in standard training objectives failing to distinguish between spurious associations and true causal drivers present in the training data. This failure leads to fragile reward models (RMs), which generate misaligned policies.

2025-07-03 09:51:04.AIbase

Stability AI Opensources Stable Audio Open Small, Turning Your Phone into an Audio Creation Wizard

2025-07-02 17:40:05.AIbase

Baidu Launches the World's First Chinese Audio-Visual Generation Model MuseSteamer, Revolutionizing the Creative Process

2025-07-02 16:19:47.AIbase

Open Source End-to-End Speech Large Model Step-Audio-AQAA: Understand Audio and Generate Natural Speech Directly

2025-06-28 09:38:16.AIbase

Tongyi Qianwen Launches the Multimodal Unified Understanding and Generation Model Qwen VLo

Recently, the Qwen VLo multimodal large model was officially released, achieving significant advancements in image content understanding and generation, offering users a brand-new visual creation experience. According to the introduction, Qwen VLo has been comprehensively upgraded based on the advantages of the original Qwen-VL series models. The model not only can accurately understand the "world", but also can perform high-quality re-creation based on understanding, truly achieving the transition from perception to generation. Users can now access Qwen Chat (chat.qwen.ai)

2025-06-27 16:32:08.AIbase

"AI Daily Report - June 27th"; Tencent open-sources lightweight Huyuan-A13B model; Keling AI launches video audio effects feature

Welcome to AIbase's [AI Daily Report]! Spend three minutes every day to learn about the latest AI news, helping you understand AI industry trends and innovative AI product applications. For more AI updates, visit: https://www.aibase.com/zh1. Tencent open-sources the lightweight Huyuan-A13B model, which can be deployed with just one mid-range GPU card. Tencent has released a new member of the Huyuan large model family, the Huyuan-A13B model, which uses a mixture of experts (MoE) architecture, with a total parameter scale of 80 billion and an activated parameter count of 13 billion, large

2025-06-17 08:48:44.AIbase

Panasonic's New OmniFlow Multimodal Large Model Enables Free Switching Between Text, Image, and Audio

With the continuous progress of artificial intelligence technology, multimodal data processing has gradually become a popular topic. Recently, the globally renowned electrical appliances brand Panasonic launched its latest R&D multimodal large model — OmniFlow. This model can efficiently convert between multiple modalities such as text, image, and audio, achieving any-to-any generation tasks, providing users with a more flexible experience. The design concept of OmniFlow is based on modularity, allowing the various components of the model to be independently pre-trained. This approach not only improves training efficiency but also avoids...

2025-06-16 16:08:30.AIbase

RAGFlow is Here! Open-source RAG Engine Unlocks Deep Document Understanding and Ignites a New Enterprise AI Revolution!

Recently, an open-source RAG (Retrieval-Augmented Generation) engine called RAGFlow has attracted significant attention in the industry. This enterprise-level AI tool, based on deep document understanding, provides companies with a brand-new solution for processing complex documents and achieving precise question answering through its powerful multimodal data processing capabilities and efficient workflows. RAGFlow: The Pioneer of Deep Document Understanding RAGFlow is a fully open-source RAG engine that focuses on deep document understanding, aiming to help businesses and individuals extract valuable information from massive amounts of unstructured data.

2025-06-12 09:42:45.AIbase

Meta Releases V-JEPA 2: New Breakthroughs in Video Understanding and Zero-Shot Robot Control Lead the Future!

The Meta AI research team has made another breakthrough in the artificial intelligence field, officially releasing the new video understanding model V-JEPA2 (Video Joint Embedding Predictive Architecture2) on June 11, 2025. Led by Meta's chief AI scientist Yann LeCun, this model has opened up new possibilities in video understanding and physical world modeling with its innovative self-supervised learning technology and zero-shot robot control capabilities.

2025-06-10 18:01:26.AIbase

Breaking Traditions! FUDOKI Model Makes Multi-Modal Generation and Understanding More Flexible and Efficient

In recent years, the field of artificial intelligence has undergone tremendous changes, especially with large language models (LLMs) making remarkable progress in multi-modal tasks. These models have demonstrated strong potential in their ability to understand and generate language, but most current multi-modal models still adopt auto-regressive (AR) architectures, which limit inference processes to be rather monotonous and lacking flexibility. To address this limitation, a research team from The University of Hong Kong and Huawei Noah's Ark Lab has proposed a brand new model – FUDOKI, aiming to break these constraints. The core innovation of FUDOKI is

2025-06-09 08:59:39.AIbase

New King of Long Text Understanding? Gemini2.5Pro Beats o3 and Leads Fiction.Live Benchmark

In the recent Fiction.Live benchmark test, Gemini2.5Pro performed excellently in understanding and reproducing complex stories and backgrounds, leading ahead of OpenAI's o3 model. This test goes far beyond traditional "needle-in-a-haystack" tasks, focusing on a model's ability to handle deep semantics and context-dependent information within large contexts. According to the test data, when the context window length reaches 192,000 tokens (approximately 144,000 words), the performance of the o3 model drops sharply, while

2025-06-05 17:39:30.AIbase

Gemini2.5 Version Released with Native Audio Functionality: AI Conversations Become More Natural

2025-06-04 15:08:58.AIbase

Panasonic Launches OmniFlow Multi-Modal Generative AI for Free Conversion Between Text, Images, and Audio

2025-06-04 09:25:31.AIbase

Fish Audio Releases OpenAudio S1: A New Benchmark for AI Voice with Professional Dubbing Actor Quality

2025-06-03 13:49:51.AIbase

BAAI Open-sources Lightweight Ultra-long Video Understanding Model Video-XL-2

Recently, the Beijing Academy of Artificial Intelligence (BAEI), in collaboration with Shanghai Jiao Tong University and other institutions, officially released a new generation of ultra-long video understanding model - Video-XL-2. The introduction of this model represents a significant breakthrough in the field of open-source ultra-long video understanding technology, injecting new vitality into the development of multimodal large models for understanding long video content. In terms of technical architecture, Video-XL-2 mainly consists of three core components: a visual encoder, the Dynamic Token Synthesis (DTS) module, and a large language model (LLM). The model adopts

2025-06-03 09:51:42.AIbase

Hume AI Releases EVI 3: A Voice AI That Understands Your Emotions Faster Than GPT-4!

Recently, Hume AI officially released its third-generation voice interaction model, EVI3. This new voice AI has drawn extensive attention in the industry due to its outstanding emotion understanding capabilities and personalized interaction experience. EVI3 can accurately identify emotions in users' voices and generate specific styles of sound and personality based on user preferences, marking a significant breakthrough in the field of emotional interaction and natural communication. Below is the latest information and in-depth analysis about EVI3 from AIbase. Experience it at: https://demo.

2025-05-29 17:46:48.AIbase

Qwen Releases OmniAudio, Which Can Generate Spatial Audio from 360-Degree Videos

Recently, the Speech Team of Qwen Laboratory has made a milestone achievement in the field of spatial audio generation and launched the OmniAudio technology. This technology can directly generate FOA (First-order Ambisonics) audio from 360-degree videos, bringing new possibilities to virtual reality and immersive entertainment. Spatial audio, as a technology that simulates real auditory environments, can enhance immersive experiences. However, existing technologies are mostly based on fixed perspective videos and underutilize the spatial information of 360-degree panoramic videos. Traditional video-to-audio generation

2025-05-29 14:32:03.AIbase

Product Finder

Product Submit

AI Models Finder

MCP Servers

MCP Client

MCP Inspector

Case Tutorials

Latest AI News

AI Daily Brief

Search AI Products and News

Explore worldwide AI information, discover new AI opportunities

AliTongyi Opensources Audio Generation Model ThinkSound Supporting Chain-of-Thought Reasoning

Gemini CLI Major Update! Audio and Video Processing + New Privacy Features - A Blessing for Developers!

DeepMind introduces Crome: Enhancing the Alignment of Large Language Models with Human Feedback

Stability AI Opensources Stable Audio Open Small, Turning Your Phone into an Audio Creation Wizard

Baidu Launches the World's First Chinese Audio-Visual Generation Model MuseSteamer, Revolutionizing the Creative Process

Open Source End-to-End Speech Large Model Step-Audio-AQAA: Understand Audio and Generate Natural Speech Directly

Tongyi Qianwen Launches the Multimodal Unified Understanding and Generation Model Qwen VLo

"AI Daily Report - June 27th"; Tencent open-sources lightweight Huyuan-A13B model; Keling AI launches video audio effects feature

Panasonic's New OmniFlow Multimodal Large Model Enables Free Switching Between Text, Image, and Audio

RAGFlow is Here! Open-source RAG Engine Unlocks Deep Document Understanding and Ignites a New Enterprise AI Revolution!

Meta Releases V-JEPA 2: New Breakthroughs in Video Understanding and Zero-Shot Robot Control Lead the Future!

Breaking Traditions! FUDOKI Model Makes Multi-Modal Generation and Understanding More Flexible and Efficient

New King of Long Text Understanding? Gemini2.5Pro Beats o3 and Leads Fiction.Live Benchmark

Gemini2.5 Version Released with Native Audio Functionality: AI Conversations Become More Natural

Panasonic Launches OmniFlow Multi-Modal Generative AI for Free Conversion Between Text, Images, and Audio

Fish Audio Releases OpenAudio S1: A New Benchmark for AI Voice with Professional Dubbing Actor Quality

BAAI Open-sources Lightweight Ultra-long Video Understanding Model Video-XL-2

Hume AI Releases EVI 3: A Voice AI That Understands Your Emotions Faster Than GPT-4!

Qwen Releases OmniAudio, Which Can Generate Spatial Audio from 360-Degree Videos

Meta Releases Multi-SpatialMLLM: Leading the Spatial Understanding Revolution in Multimodal AI