Voice AI 'Step to Success'! Step Audio Unveils 130B Dominant Voice Model, Real-Time Dialogue + Emotion Cloning, Here It Comes!

AIbase基地

Published inAI News · 4 min read · Feb 18, 2025

520

A milestone breakthrough in the field of voice interaction! Recently, the domestic AI company Step Audio has shockingly open-sourced a a massive voice model with 130 billion parameters, attracting significant attention from the industry. This powerful model, hailed as "dominant," is the industry's first product-level open-source real-time voice dialogue system that integrates voice understanding and generation control. Its comprehensive functionality and advanced technology are astonishing, indicating that the development of voice AI technology may leap to new heights.

The core highlight of this open-source model lies in its integrated design and powerful control capabilities. It can not only accurately understand user voice commands but also flexibly control the voice generation process, creating an unprecedented personalized voice interaction experience.

In terms of language support, this model demonstrates impressive multilingual capabilities, smoothly switching between Chinese, English, and Japanese, easily handling cross-language communication scenarios. Even more surprisingly, it deeply supports dials, currently covering major dialects such as Cantonese and Sichuanese, making voice interaction closer to everyday life and more relatable.

Besides language, this model can finely control voice emotions, allowing users to freely set the emotional tone of the voice, such as happy or sad, making AI expressions more impactful. The speech rate and prosody style can also be adjusted at will to meet different expressive needs in various contexts. It even goes further by supporting RAP and humming, introducing limitless possibilities for content creation.

Even more astonishing is that this model features voice cloning, meaning users can utilize this technology to create highly personalized voice assistants, even achieving the "replication" and "inheritance" of voices.

Step Audio's open-sourcing of such a powerful voice model will undoubtedly greatly promote technological progress and application innovation across the industry. It not only significantly lowers the barriers to applying voice AI technology but also suggests that future voice interactions will become smarter, more natural, and personalized, truly integrating into people's daily lives.

Project address: https://github.com/stepfun-ai/Step-Audio/tree/main

Voice Interaction Step Audio Large-Scale Voice Model Open Source Real-Time Voice Dialogue System

This article is from AIbase Daily

Welcome to the [AI Daily] column! This is your daily guide to exploring the world of artificial intelligence. Every day, we present you with hot topics in the AI field, focusing on developers, helping you understand technical trends, and learning about innovative AI product applications.

—— Created by the AIbase Daily Team

AI News Recommendations

BMW and Alibaba Announce AI Collaboration: Tongyi Large Model to be Integrated into Vehicles

BMW Group and Alibaba Group have officially announced a strategic collaboration in the Chinese market. Both companies will focus on cutting-edge technologies such as AI large language models and intelligent voice interaction, aiming to develop advanced solutions tailored to the needs of Chinese consumers.

Mar 26, 2025

370

French Startup Rounded Launches AI Voice Assistant Platform to Help Businesses Customize Voice Interactions

Jan 10, 2025

1.9k

GPT-4 Level! VITA-1.5: Real-time Visual and Voice Interaction with 1.5 Seconds Interaction Delay

Recently, the VITA-MLLM team announced the launch of VITA-1.5, an upgraded version based on VITA-1.0, aimed at enhancing the real-time and accuracy of multimodal interaction. VITA-1.5 not only supports English and Chinese but also achieves significant improvements in multiple performance metrics, providing users with a smoother interaction experience. In VITA-1.5, the interaction delay has been greatly reduced from 4 seconds to just 1.5 seconds, making it almost imperceptible for users during voice interactions.

Jan 7, 2025

2.9k

Luo Yonghao's New AI Assistant 'J1 Assistant' Officially Launched, Voice Interaction Leads the New Era of Intelligence

Jan 5, 2025

23.0k

OpenAI Announces ChatGPT Search Upgrade Supporting Maps, Advanced Voice Features

OpenAI has recently made significant technological updates to its ChatGPT platform, introducing real-time search capabilities and advanced voice interaction modes, greatly enhancing user experience. In their latest tech showcase live stream, OpenAI demonstrated a series of new features for ChatGPT, including deep optimizations to the search algorithm that allow users to quickly access real-time information such as stock data and news, significantly improving the timeliness and practicality of search.

Dec 17, 2024

4.4k

Anthropic and Hume AI Explore New Voice Interaction Technology to Redefine the Future of Human-Computer Interfaces

Recently, Anthropic and Hume AI launched an innovative voice interaction technology aimed at achieving human-computer interaction in a more natural and emotionally intelligent way. This technology combines Claude's natural language processing capabilities with EVI2's emotional recognition functions, providing new ideas for digital assistant interaction models. The core technology, EVI2, can detect subtle emotional cues in user speech and adjust the interaction accordingly. Compared to traditional voice assistants, this system significantly enhances the fluidity and personalization of interactions.

Nov 26, 2024

1.4k

Google Launches AI Application for iPhone with Voice Interaction Feature Gemini Live

Google officially launched the new Gemini application on the Apple App Store, introducing the voice interaction feature Gemini Live, marking a significant breakthrough in the smart voice assistant field. Meanwhile, Apple's plan to integrate OpenAI's ChatGPT into Siri also indicates an intensifying competition in this area. As an upgraded version of Bard released by Google in 2023, Gemini is

Nov 18, 2024

1.8k

Yunzhisheng Launches Shanhai Multimodal Large Model: Supports Free Voice Modulation and Visual Scene Understanding

On August 23, 2024, the well-known Chinese artificial intelligence company Yunzhisheng announced the launch of its latest research and development achievement – the Shanhai Multimodal Large Model in Beijing. The Shanhai Multimodal Large Model is part of Yunzhisheng's Atlas AI infrastructure, capable of receiving and processing inputs from various modalities, including text, audio, and images, and generating any combination of text, audio, and image outputs in real-time. This capability enables the Shanhai model to not only conduct efficient voice interactions but also provide a conversational experience that is close to natural human dialogue.

Aug 26, 2024

3.3k

CNKI Unveils Mobile App for AI-Powered Academic Research Assistant

Recently, CNKI launched the mobile version of its AI Academic Research Assistant, aimed at providing researchers with more convenient academic support. This AI assistant, after receiving widespread acclaim upon its launch on the PC platform, is now available through the CNKI mobile app, catering to users needs for on-the-go access.The main features provided by the AI Academic Research Assistant include:Enhanced Question-Answering Retrieval: Users can ask questions in natural language, and the AI

Jul 18, 2024

2.2k

Qwen2-Audio: Multimodal Audio Model of the Qwen Series for Voice Interaction Without Text

Alibaba Cloud has recently released a large-scale audio language model named Qwen-Audio, which can accept various audio signal inputs and perform audio analysis or directly respond to voice commands, significantly enhancing the voice interaction experience.Product Entry:https://top.aibase.com/tool/qwen2-audioIn this release, Qwen2-Audio offers two unique voice interaction modes: voice chat and audio analysis. Users can interact with Qwen2-Audio via voice without the need for text input, and can

Jul 17, 2024

7.8k

AI News

AI Daily

AI Timeline

Al Hardware

Latest Cases

Image Collection

Video Collection

Audio Collection

Content Collection

Latest Tutorials

AI Product Ranking

AI Traffic Growth Ranking

AI Traffic Decline Ranking

AI Weekly Ranking

United States

China

India

Brazil

Image Generation

Personal Assistant

Character Generation

Video Generation

AI Project Ranking

AI Project Growth Ranking

AI Developer Ranking

AI Organization Ranking

Deepseek

TTS

LLM

ChatGPT

Overview