Fish Audio Releases Fish Agent V0.1 3B Real-Time Voice Cloning

AIbase基地

Published inAI News · 4 min read · Nov 5, 2024

1.4k

Recently, Fish Audio introduced the latest voice processing model, Fish Agent V0.13B. This state-of-the-art text-to-speech model excels in generating and processing speech efficiently and accurately, particularly in simulating or cloning various voices. This advancement brings us closer to having a natural and responsive AI voice assistant.

Fish Agent V0.13B is pre-trained on the Qwen-2.5-3B-Instruct and utilizes a massive dataset comprising 200 billion voice and text tokens. Unlike traditional models that require complex semantic encoding, Fish Agent V0.13B employs a "semantic-free token" architecture, directly processing and generating speech at the sound level. This direct approach not only simplifies the model structure but also enhances its responsiveness and efficiency.

Thanks to this innovative architecture, Fish Agent V0.13B can generate high-quality speech quickly and naturally, achieving "instant" voice cloning and text-to-speech conversion with a Text-to-Audio conversion time (TTFA) of just 200 milliseconds. This feature makes it ideal for applications requiring real-time speech generation, such as voice assistants, automated customer service, and other scenarios needing rapid voice feedback.

Fish Agent V0.13B supports multiple languages including English, Chinese, German, Japanese, French, Spanish, Korean, and Arabic, and is trained on approximately 700,000 hours of multilingual audio data. This means it can handle various languages and contexts, producing speech that is more natural and closer to human pronunciation.

In addition to voice generation and text-to-speech conversion, Fish Agent V0.13B boasts the following key features:

Zero-shot voice cloning: Enables voice cloning without the need for training.

Streamlined 3B parameters: Utilizes 3 billion parameters, facilitating development.

Supports text and audio input: Offers flexible multi-input methods.

Currently, Fish Audio has open-sourced the Fish Agent V0.13B model and provided a preliminary demo version for user experience. This release will further propel the development of AI voice technology, offering more possibilities for applications like voice assistants and virtual humans.

GitHub: https://github.com/fishaudio/fish-speech

Fish Agent Demo: https://huggingface.co/spaces/fishaudio/fish-agent

Model Download: https://huggingface.co/fishaudio/fish-agent-v0.1-3b

Technical Report: https://arxiv.org/abs/2411.01156

Tencent Hunyuan-A13B Model API Launches

Recently, Tencent Cloud officially launched the API service for the Tencent Hunyuan A13B model on its official website. The input price is set at 0.5 yuan per million Tokens, and the output price is 2 yuan per million Tokens, which has quickly sparked enthusiastic discussions in the developer community. As the first 13B-level MoE (Mixture of Experts) open-source hybrid inference model in the industry, Hunyuan-A13B features a total of 80B parameters and only 13B activated parameters, achieving performance comparable to leading open-source models of the same architecture, while also demonstrating efficient reasoning capabilities.

Huawei Open Sources Dense Pangu 7B and Mixture of Experts Model with 72B Parameters

On June 30, Huawei officially announced the open sourcing of the Pangu dense model with 7 billion parameters, the PanguPro MoE model with 72 billion parameters, and the model inference technology based on Ascend. This open-source initiative is a key step in Huawei's strategy to build an Ascend ecosystem, aiming to promote research and innovation in large model technology, accelerate the application of artificial intelligence across industries, and create value.

"AI Daily Report - June 27th"; Tencent open-sources lightweight Huyuan-A13B model; Keling AI launches video audio effects feature

Welcome to AIbase's [AI Daily Report]! Spend three minutes every day to learn about the latest AI news, helping you understand AI industry trends and innovative AI product applications. For more AI updates, visit: https://www.aibase.com/zh1. Tencent open-sources the lightweight Huyuan-A13B model, which can be deployed with just one mid-range GPU card. Tencent has released a new member of the Huyuan large model family, the Huyuan-A13B model, which uses a mixture of experts (MoE) architecture, with a total parameter scale of 80 billion and an activated parameter count of 13 billion, large

Product Finder

Product Submit

AI Models Finder

MCP Servers

MCP Client

MCP Inspector

Case Tutorials

Latest AI News

AI Daily Brief

Fish Audio Releases Fish Agent V0.1 3B Real-Time Voice Cloning

AIbase基地

This article is from AIbase Daily

AI News Recommendations

Tencent Hunyuan-A13B Model API Launches

Hugging Face Launches SmolLM3: A 3B-Parameter Small Model Competes with 4B Giants, 128K Context Leads a New Trend in Efficient AI!

Tencent Open-Sourced Huan Yuan-A13B: A Dynamic Inference Large Model, Focused on Thinking

B站 Launches HAI Creation Tool, Fully Expanding into Video Podcasts

B站AniSora V3 Launches with a Strong Impact: A Faster and More Efficient Anime Video Generation Tool

ByteDance Open Sources New Model VINCIE-3B: 300 Million Parameters Support Continuous Image Editing with Context

DeepSWE Open Source AI Agent System Makes a Strong Debut, Based on Qwen3-32B

Tencent Open Sources Hunyuan-A13B: An AI Model with Small Size and Great Intelligence

Huawei Open Sources Dense Pangu 7B and Mixture of Experts Model with 72B Parameters

"AI Daily Report - June 27th"; Tencent open-sources lightweight Huyuan-A13B model; Keling AI launches video audio effects feature