The scenarios depicted in the science fiction film "Her" seem to be materializing in reality. GPT-4o's voice capabilities have finally entered the gray test phase, and some ChatGPT Plus users have already had the opportunity to experience this exciting new feature. OpenAI's innovation not only allows the AI to tell jokes, mimic cat sounds, but it can also serve as a "second language coach" to help practice speaking.

GPT-4o's voice mode offers a more natural and real-time conversational experience. Users can interrupt the AI at will, and it can even sense and respond to the user's emotions. It is expected that by this fall, all ChatGPT Plus users will be able to use this feature. Even more anticipated is the upcoming release of video and screen-sharing capabilities, which will enable users to have "face-to-face" interactions with ChatGPT.

image.png

GPT-4o's output capacity has also seen a significant boost. The new model's output token count has surged from 4,000 to 64,000, meaning it can generate content equivalent to four full-length movie scripts at once. OpenAI has quietly rolled out this beta version of the new model, gpt-4o-64k-output-alpha, on their official webpage.

To ensure safety and quality, OpenAI has been rigorously testing GPT-4o's voice capabilities over the past few months. They have worked with over 100 red team members to test 45 languages and trained the model to speak using only four preset voices to protect user privacy. Additionally, content filtering is essential, with measures in place to prevent the generation of violent and copyright-related content.

The real-world tests of GPT-4o's voice mode have left a deep impression on netizens. Some found it could answer questions quickly with almost no delay; others used it to mimic different voices and accents; some even had it serve as a football match commentator or tell stories vividly in Chinese. These cases demonstrate GPT-4o's powerful capabilities in voice recognition and generation.

It is worth noting that, although OpenAI claims that video and screen-sharing features will be released later, some netizens have already experienced these functions ahead of time. For example, a netizen showed ChatGPT the small nest they prepared for their new pet cat, and after viewing it, ChatGPT commented, "It must be very comfortable," and inquired about the cat's well-being.

Furthermore, GPT-4o's long output feature has quietly gone live. OpenAI has officially announced the provision of the GPT-4o Alpha version to testers, supporting up to 64K tokens per request, equivalent to a 200-page novel. The introduction of this feature is based on users' demand for longer output content.

However, longer outputs also mean higher computational demands and costs. The price for GPT-4o Long Output is $6 per million input tokens and $18 per million output tokens, which has increased compared to previous models. Nevertheless, some researchers believe that long outputs are mainly useful for data transformation and are very helpful for scenarios such as writing code and improving writing.

Overall, GPT-4o's voice capabilities and long output abilities will undoubtedly provide users with a richer and more convenient interactive experience. We have reason to believe that as technology continues to advance, AI will demonstrate its unique value in more fields.