A milestone breakthrough in the field of voice interaction! Recently, the domestic AI company Step Audio has shockingly open-sourced a a massive voice model with 130 billion parameters, attracting significant attention from the industry. This powerful model, hailed as "dominant," is the industry's first product-level open-source real-time voice dialogue system that integrates voice understanding and generation control. Its comprehensive functionality and advanced technology are astonishing, indicating that the development of voice AI technology may leap to new heights.
The core highlight of this open-source model lies in its integrated design and powerful control capabilities. It can not only accurately understand user voice commands but also flexibly control the voice generation process, creating an unprecedented personalized voice interaction experience.
In terms of language support, this model demonstrates impressive multilingual capabilities, smoothly switching between Chinese, English, and Japanese, easily handling cross-language communication scenarios. Even more surprisingly, it deeply supports dials, currently covering major dialects such as Cantonese and Sichuanese, making voice interaction closer to everyday life and more relatable.
Besides language, this model can finely control voice emotions, allowing users to freely set the emotional tone of the voice, such as happy or sad, making AI expressions more impactful. The speech rate and prosody style can also be adjusted at will to meet different expressive needs in various contexts. It even goes further by supporting RAP and humming, introducing limitless possibilities for content creation.
Even more astonishing is that this model features voice cloning, meaning users can utilize this technology to create highly personalized voice assistants, even achieving the "replication" and "inheritance" of voices.
Step Audio's open-sourcing of such a powerful voice model will undoubtedly greatly promote technological progress and application innovation across the industry. It not only significantly lowers the barriers to applying voice AI technology but also suggests that future voice interactions will become smarter, more natural, and personalized, truly integrating into people's daily lives.
Welcome to the [AI Daily] column! This is your daily guide to exploring the world of artificial intelligence. Every day, we present you with hot topics in the AI field, focusing on developers, helping you understand technical trends, and learning about innovative AI product applications.