Today, OpenAI announced updates to its real-time API, which is currently in beta. The highlight of this update is the introduction of five new voice options, designed specifically for voice-to-voice applications, while also reducing associated caching fees, making it more economical for developers to use.
Among the five new voices introduced, OpenAI showcased three of them, Ash, Verse, and the British-sounding Ballad, in a post on X. These voices are not only more vivid and adjustable but also provide a more natural communication experience. OpenAI mentioned in its API documentation that this native voice-to-voice feature eliminates intermediate text formatting, achieving low latency and more nuanced outputs.
However, OpenAI also reminded users that due to the real-time API still being in beta, client authentication is temporarily unavailable. Additionally, real-time audio processing may be affected by network conditions, posing challenges for large-scale audio transmission. OpenAI noted that ensuring reliable audio transmission under unstable network conditions is indeed a daunting task.
OpenAI's journey in voice technology has been controversial. In March this year, they launched the "Voice Engine," a voice cloning platform, attempting to compete with ElevenLabs, but it was only open to a few researchers. With the demonstrations of GPT-4o and voice models, OpenAI paused the use of a voice named "Sky" in May due to actress Scarlett Johansson's dissatisfaction, as it was too similar to her voice.
In September, OpenAI introduced an advanced voice mode for its paid subscribers, including ChatGPT Plus, Enterprise, Teams, and Edu users. Through this voice-to-voice technology, businesses can generate real-time responses more quickly, significantly improving customer service efficiency.
Reduce Costs by Over 50%
Regarding the pricing of the real-time API, OpenAI previously set the price at $0.06 per minute for audio input and $0.24 for audio output, which was relatively high for developers. However, with this update, the cost for cached text input will be reduced by 50%, and the cost for cached audio input will be discounted by 80%.
OpenAI announced the new feature "Prompt Caching" at the developer day, which can store frequently requested context prompts in the model's memory, thereby reducing the number of tokens needed to generate responses. By lowering the input price, OpenAI aims to attract more developers to use its API.
Additionally, other companies like Anthropic have also introduced similar caching features to enhance the appeal of their voice technology.
Key Points:
🌟 Introduced five natural voices, enhancing voice application experience
💰 Real-time API reduces input fees through caching, more cost-effective for developers
⚡ Real-time audio processing affected by network conditions, reliability needs attention