OpenAI's flagship model, GPT-4o (with "o" standing for "omni"), garnered significant attention for its audio comprehension capabilities when released in May. The GPT-4o model can respond to audio inputs at an average speed of 320 milliseconds, similar to human reaction times in typical conversations.

ChatGPT OpenAI Artificial Intelligence (1)

OpenAI also announced that the voice mode feature of ChatGPT will leverage the audio capabilities of the GPT-4o model to provide users with a seamless voice conversation experience. Regarding the voice capabilities of GPT-4o, the OpenAI team wrote:

"With GPT-4o, we have trained a new model that integrates text, visual, and audio modalities in an end-to-end fashion, meaning all inputs are processed by the same neural network. As GPT-4o is our first model to combine all these modalities, we are still in the early stages of exploring its potential and limitations."

In June, OpenAI announced plans to release an advanced voice mode to a small group of ChatGPT Plus users in an Alpha version later, but this was delayed by a month due to the need to improve the model's ability to detect and reject certain content. Additionally, OpenAI is preparing its infrastructure to scale to millions of users while maintaining real-time responses.

Now, OpenAI's CEO, Sam Altman, has confirmed via X that the Alpha version of the voice mode will be rolled out to ChatGPT Plus subscribers starting next week.

image.png

The current ChatGPT voice mode, with average latencies of 2.8 seconds (GPT3.5) and 5.4 seconds (GPT-4), is not intuitive to use. The upcoming advanced voice mode based on GPT-4o will allow ChatGPT subscribers to engage in seamless, lag-free conversations.

Additionally, OpenAI has also released the highly anticipated SearchGPT today, a new attempt at web search experiences. Currently in prototype, SearchGPT offers AI-powered search capabilities, quickly delivering accurate answers from clear and relevant sources. You can learn more here.

Key Points:

- ChatGPT Plus subscribers will receive the new voice mode feature next week, enabling lag-free, smooth conversational experiences.

- The GPT-4o model combines training in text, visual, and audio modalities, opening up more potential and limitations for OpenAI.

- OpenAI has also launched SearchGPT, providing fast and accurate AI-powered search functionalities.