OpenAI ChatGPT Multimodal Features Officially Launched, Supporting Voice Interaction and Image Recognition

智能涌现

Published inAI News · 1 min read · Sep 26, 2023

109

On September 25th, OpenAI introduced multi-modal capabilities including voice interaction and image recognition to its popular conversational AI, ChatGPT. These new features allow users to engage through voice conversations and image uploads, enabling functionalities such as voice recognition, text recognition, and object detection. The multi-modal version of ChatGPT is named GPT-4V, which was trained concurrently with GPT-4 but delayed in release due to considerations of safety. OpenAI stated that the new features will initially be rolled out to ChatGPT Plus subscribers and enterprise users.

ChatGPT OpenAI Multimodal

This article is from AIbase Daily

Welcome to the [AI Daily] column! This is your daily guide to exploring the world of artificial intelligence. Every day, we present you with hot topics in the AI field, focusing on developers, helping you understand technical trends, and learning about innovative AI product applications.

—— Created by the AIbase Daily Team

AI News Recommendations

Meituan Launches Native Multimodal LongCat-Next: Visual and Speech Achieve Bottom-Level Unification

Meituan launches LongCat-Next, a native multimodal AI model that uses DiNA technology to unify images, audio, and text into discrete tokens, enabling deep integration of multimodal modeling for enhanced perception of the physical world.....

Apr 3, 2026

Behind the Children's Safety Alliance: Hidden Secrets: OpenAI's Secret Funding Sparks Doubt

OpenAI funds a California coalition advocating AI regulations like age verification and parental controls, but its undisclosed support raises transparency concerns among child safety groups.....

Apr 3, 2026

Microsoft Launches AI Self-Development Campaign: Aiming to Unveil the Strongest In-House Model by 2027

Microsoft is accelerating its in-house development of cutting-edge AI models, aiming to lead in text, image, and audio processing by 2027 to challenge OpenAI and Anthropic, shifting from external collaborations to strengthening core proprietary technology.....

Apr 3, 2026

180

OpenAI Acquires Tech Comedy Show TBPN to Guide AI Public Dialogue

OpenAI acquires tech talk show TBPN to engage the public on AI developments via live broadcasts, maintaining its format with industry guests. This move is a strategic effort to enhance public interaction and education during a pivotal period.....

Apr 3, 2026

170

OpenAI Shuts Down Sora and Immediately Acquires Podcast Giant TBPN, Taking Part in Promoting the Trend

OpenAI acquires TBPN, shifting focus from AI video to content channels to build conversational spaces and enhance brand influence, marking a move from tech tools to media platforms.....

Apr 3, 2026

150

AI Daily: Zhipu Releases GLM-5V-Turbo Multimodal Coding Large Model; Seedance 2.0 API Now Fully Opened; Meituan LongCat-AudioDiT Open-Sourced

Welcome to the [AI Daily] section! This is your guide to exploring the world of artificial intelligence every day. Every day, we present the latest content in the AI field, focusing on developers, helping you understand technology trends and innovative AI product applications. Discover new AI products: https://app.aibase.com/zh1. Zhipu releases the GLM-5V-Turbo multimodal coding large model, which achieves visual and programming capabilities

Apr 2, 2026

450

Zhipu Launches GLM-5V-Turbo: Giving AI Agents a Sharp Vision

Zhipu launches the multimodal programming model GLM-5V-Turbo, which has visual understanding capabilities, allowing it to convert visual information such as design drafts and web interfaces into code, enabling AI Agents to extend their perception from text to visual input.

Apr 2, 2026

600

ByteDance Volcano Engine Seedance 2.0 Officially Opens Application for General API Customers

ByteDance's Volcano Engine opened public API applications for the Seedance2.0 multimodal video generation model on April 2, transitioning from limited testing to broader availability. The model supports text, image, audio, and video inputs, enabling character consistency, director-level shot control, and physical simulation.....

Apr 2, 2026

760

The Trend in the Secondary Market Has Changed: OpenAI Stocks Are Cold, Anthropic Becomes a New Favorite Among Investors

The AI secondary market is showing divergence: OpenAI shares are cold, with a $600 million offering remaining unsold; meanwhile, competitor Anthropic has become the new favorite, with about $2 billion in funds waiting to enter, highlighting a shift in investment logic.

Apr 2, 2026

160

Zhipu Releases GLM-5V-Turbo Multimodal Coding Large Model

GLM-5V-Turbo is a multimodal base model designed for visual programming, capable of coding, understanding images, videos, designs, and document layouts, integrating vision with programming to expand AI Agent perception from text to visual interfaces.....

Apr 2, 2026

1.8k

Latest AI News

AI Daily Brief

AI Product Finder

AI Product Rankings

AI Product Submit

AI Tools Directory

GEO Brand Visibility

AI Visibility Audit

AI Search Visibility Checker

GEO Promotion Link Detection

GEO Ranking Optimization System

GEO Services​

MCP Servers

MCP Client

MCP Case Tutorials

MCP Ranking

MCP Service Submission

MCP Playground

MCP Inspector

LLM API Hub

AI Models Finder

Model Providers

LLM Leaderboard

Compare LLMs

LLM Cost Calculator

LLM Arena

AI Model Compatibility Checker

AI Deployment Calculator

OpenAI ChatGPT Multimodal Features Officially Launched, Supporting Voice Interaction and Image Recognition

智能涌现

This article is from AIbase Daily

AI News Recommendations

Meituan Launches Native Multimodal LongCat-Next: Visual and Speech Achieve Bottom-Level Unification

Behind the Children's Safety Alliance: Hidden Secrets: OpenAI's Secret Funding Sparks Doubt

Microsoft Launches AI Self-Development Campaign: Aiming to Unveil the Strongest In-House Model by 2027

OpenAI Acquires Tech Comedy Show TBPN to Guide AI Public Dialogue

OpenAI Shuts Down Sora and Immediately Acquires Podcast Giant TBPN, Taking Part in Promoting the Trend

AI Daily: Zhipu Releases GLM-5V-Turbo Multimodal Coding Large Model; Seedance 2.0 API Now Fully Opened; Meituan LongCat-AudioDiT Open-Sourced

Zhipu Launches GLM-5V-Turbo: Giving AI Agents a Sharp Vision

ByteDance Volcano Engine Seedance 2.0 Officially Opens Application for General API Customers

The Trend in the Secondary Market Has Changed: OpenAI Stocks Are Cold, Anthropic Becomes a New Favorite Among Investors

Zhipu Releases GLM-5V-Turbo Multimodal Coding Large Model

AI News Recommendations

Meituan Launches Native Multimodal LongCat-Next: Visual and Speech Achieve Bottom-Level Unification

Behind the Children's Safety Alliance: Hidden Secrets: OpenAI's Secret Funding Sparks Doubt

Microsoft Launches AI Self-Development Campaign: Aiming to Unveil the Strongest In-House Model by 2027

OpenAI Acquires Tech Comedy Show TBPN to Guide AI Public Dialogue

OpenAI Shuts Down Sora and Immediately Acquires Podcast Giant TBPN, Taking Part in Promoting the Trend

AI Daily: Zhipu Releases GLM-5V-Turbo Multimodal Coding Large Model; Seedance 2.0 API Now Fully Opened; Meituan LongCat-AudioDiT Open-Sourced

Zhipu Launches GLM-5V-Turbo: Giving AI Agents a Sharp Vision

ByteDance Volcano Engine Seedance 2.0 Officially Opens Application for General API Customers

The Trend in the Secondary Market Has Changed: OpenAI Stocks Are Cold, Anthropic Becomes a New Favorite Among Investors

Zhipu Releases GLM-5V-Turbo Multimodal Coding Large Model

GEO Services