GPT-4o Voice Feature Begins Gradual Testing: Not Only Can It Tell Jokes and Mimic Cat Sounds, but It Can Also Help Practice Speaking

AIbase基地

Published inAI News · 6 min read · Jul 31, 2024

134

The scenarios depicted in the science fiction film "Her" seem to be materializing in reality. GPT-4o's voice capabilities have finally entered the gray test phase, and some ChatGPT Plus users have already had the opportunity to experience this exciting new feature. OpenAI's innovation not only allows the AI to tell jokes, mimic cat sounds, but it can also serve as a "second language coach" to help practice speaking.

GPT-4o's voice mode offers a more natural and real-time conversational experience. Users can interrupt the AI at will, and it can even sense and respond to the user's emotions. It is expected that by this fall, all ChatGPT Plus users will be able to use this feature. Even more anticipated is the upcoming release of video and screen-sharing capabilities, which will enable users to have "face-to-face" interactions with ChatGPT.

GPT-4o's output capacity has also seen a significant boost. The new model's output token count has surged from 4,000 to 64,000, meaning it can generate content equivalent to four full-length movie scripts at once. OpenAI has quietly rolled out this beta version of the new model, gpt-4o-64k-output-alpha, on their official webpage.

To ensure safety and quality, OpenAI has been rigorously testing GPT-4o's voice capabilities over the past few months. They have worked with over 100 red team members to test 45 languages and trained the model to speak using only four preset voices to protect user privacy. Additionally, content filtering is essential, with measures in place to prevent the generation of violent and copyright-related content.

The real-world tests of GPT-4o's voice mode have left a deep impression on netizens. Some found it could answer questions quickly with almost no delay; others used it to mimic different voices and accents; some even had it serve as a football match commentator or tell stories vividly in Chinese. These cases demonstrate GPT-4o's powerful capabilities in voice recognition and generation.

It is worth noting that, although OpenAI claims that video and screen-sharing features will be released later, some netizens have already experienced these functions ahead of time. For example, a netizen showed ChatGPT the small nest they prepared for their new pet cat, and after viewing it, ChatGPT commented, "It must be very comfortable," and inquired about the cat's well-being.

Furthermore, GPT-4o's long output feature has quietly gone live. OpenAI has officially announced the provision of the GPT-4o Alpha version to testers, supporting up to 64K tokens per request, equivalent to a 200-page novel. The introduction of this feature is based on users' demand for longer output content.

However, longer outputs also mean higher computational demands and costs. The price for GPT-4o Long Output is $6 per million input tokens and $18 per million output tokens, which has increased compared to previous models. Nevertheless, some researchers believe that long outputs are mainly useful for data transformation and are very helpful for scenarios such as writing code and improving writing.

Overall, GPT-4o's voice capabilities and long output abilities will undoubtedly provide users with a richer and more convenient interactive experience. We have reason to believe that as technology continues to advance, AI will demonstrate its unique value in more fields.

DingTalk Launches New AI Spreadsheet Functionality, Introducing the 'Spreadsheet as Document' Feature

Recently, DingTalk officially launched the 'AI Spreadsheet' feature, marking the official start of a new application entry point for the AI era. In DingTalk AI Spreadsheet, AI technology has become an intrinsic capability, with each cell serving as an AI access point, creating intelligent workflows and providing enterprises and users with an unprecedented method of building business systems.

Grok4 to be released: Musk confirms X platform live stream on Wednesday night

Elon Musk announced that xAI's new generation large model Grok4 will be released at 8 PM (11 PM Beijing Time on Thursday) this Wednesday, and the launch will be live-streamed on the X platform. Musk previously revealed that Grok has seen significant improvements, and this release will showcase xAI's latest breakthroughs in the AI field.

ChatGPT Launches New 'Learn Together' Feature to Drive Transformation in the Education Sector

ChatGPT launches a new 'Learn Together' feature that promotes active thinking through question-based guidance, similar to Google LearnLM. This feature could evolve into interactive learning groups, currently available only to a subset of subscription users. As an educational tool, ChatGPT is widely used for curriculum design and study assistance. The new feature aims to standardize usage and reduce academic misconduct. Although the specific release scope is not yet determined, this marks an innovative exploration of AI in the education sector, which may change traditional teaching interaction models.

Microsoft Launches Deep Research: Integration of Bing and OpenAI to Revolutionize Automated Research

Microsoft launches the Deep Research research tool, which integrates Bing search and OpenAI technology to automate research. The tool uses the core technology o3-deep-research, with a workflow that includes four key steps: first, interacting with GPT-4o/4.1 to clarify user requirements; second, calling Bing to retrieve the latest data; third, performing intelligent analysis and reasoning; finally, generating a structured report containing answers, reasoning process, cited sources, and clarification records. The tool supports integration with Azure AI

ChatGPT New Feature: Learn Together - The New Assistant for Future Education?

ChatGPT has launched a new 'Learn Together' feature that promotes user active thinking through question-guided methods, similar to the interactive learning strategies designed by OpenAI for Google's LearnLM. This feature may support a collaborative learning mode, but the official release time and usage conditions have not been confirmed yet. This educational function has drawn attention, being seen as an innovative tool for assisting teaching, while also raising concerns about the quality of higher education. ChatGPT is trying to balance educational applications and anti-cheating needs through guided learning, and its future development is worth following closely.

OpenAI Announces GPT-5 Will Integrate Multiple Models for a New Breakthrough

OpenAI plans to launch GPT-5 this summer, integrating multiple model capabilities. The new version combines reasoning of 'O-series' and GPT's multimodal advantages for enhanced performance. It aims to simplify user experience by eliminating model switching. GPT-5 will boost functionality and usability, though exact release timing remains unclear.....

OpenAI Takes a Unique Approach with a Researcher Residency Program to Attract Emerging AI Talent

OpenAI launches 'Residency Program' offering $210K/year to attract cross-disciplinary AI talent. The 6-month program trains 30 researchers annually from fields like physics and neuroscience, with full-time conversion for top performers. Unlike Meta's high-cost recruitment, OpenAI prioritizes cultural alignment over salary to retain talent.....

Product Finder

Product Submit

AI Models Finder

MCP Servers

MCP Client

MCP Inspector

Case Tutorials

Latest AI News

AI Daily Brief

GPT-4o Voice Feature Begins Gradual Testing: Not Only Can It Tell Jokes and Mimic Cat Sounds, but It Can Also Help Practice Speaking

AIbase基地

This article is from AIbase Daily

AI News Recommendations

ChatGPT Mistake Triggers New Feature Development! Developers Helplessly Face a User Surge