ByteDance's Doubao Model Supports Real-time Voice Calls: Can Be Interrupted Anytime with Instant Responses

AIbase基地

Published inAI News · 3 min read · Aug 9, 2024

1.0k

Today, ByteDance announced a new feature for Doubao's large model that supports real-time voice calls.

It is reported that the conversational AI real-time interaction solution provided by Volcano Engine combines the Volcano Fangzhou large model service platform with Doubao's voice recognition and synthesis models, simplifying the process of converting voice to text and text to voice. This solution achieves efficient collection, processing, and transmission of voice data, providing excellent intelligent dialogue and natural language processing capabilities.

ByteDance Douyin Doubao Large Model

Volcano Engine RTC, based on audio 3A processing technology, effectively solves the "double talk" phenomenon, ensuring the accuracy and real-time nature of voice recognition. At the same time, using the WebRTC transmission network, it achieves ultra-low latency, stable, and reliable real-time audio and video transmission services globally.

Volcano Engine also offers flexible and diversified access solutions, including self-integration and WebRTC standard protocol-based transmission network solutions, to meet the specific needs of different enterprises.

Additionally, Volcano Engine's large model multi-modal real-time interaction service has provided AI real-time voice capabilities for some leading domestic AI virtual character chat applications, bringing a new interactive experience. Volcano Engine will continue to provide high-quality audio and video capabilities and AI capabilities to help enterprises innovate in the field of AI real-time audio and video.

Doubao Model Volcano Engine Real-time Voice Calls WebRTC

This article is from AIbase Daily

Welcome to the [AI Daily] column! This is your daily guide to exploring the world of artificial intelligence. Every day, we present you with hot topics in the AI field, focusing on developers, helping you understand technical trends, and learning about innovative AI product applications.

—— Created by the AIbase Daily Team

AI News Recommendations

AI Daily: Meitu Launches Imaging AI Agent RoboNeo; 1.8bit Quantized Kimi K2 Model Released; Amazon Introduces AI Code Editor Kiro

Jul 15, 2025

Unsloth AI Releases 1.8-bit Quantized Kimi K2 Model, Significantly Reducing Deployment Costs

Unsloth AI quantized Moonshot AI's 1T-parameter Kimi K2 model to 1.8bit, reducing size by 80% to 245GB while maintaining performance. The MoE-based model excels in coding and reasoning, now deployable on 512GB M3Ultra devices, lowering costs. This advancement positions Kimi K2 as a GPT-4.1 competitor, benefiting SMEs and boosting open-source AI adoption in education/healthcare.....

Jul 15, 2025

120

UTCP Makes a Strong Entry! Revolutionizing MCP AI Tool Calls into a New Era of Zero Packaging

UTCP, as an alternative to MCP, directly connects tool endpoints via JSON-defined functions, eliminating proxy layers for lower latency while maintaining security. Its simplicity and compatibility spark developer interest as a potential AI tool standard.....

Jul 15, 2025

240

xAI Launches New Features! Grok Web Version Voice Mode Opens, Challenging ChatGPT to New Heights

xAI launches Grok Voice for Web with 5 voice options (Ara/Rex/Eve/Sal/Gork) and screen sharing, expanding from mobile to web for better office use. Despite early bugs, unique features give it a competitive edge. Basic functions are free; premium may require subscription. Future plans include coding models and video capabilities.....

Jul 15, 2025

160

Meta May Abandon the Open-Source Philosophy and Shift to Proprietary AI Model Development

Meta may shift from open-source to closed-source AI, potentially abandoning its 'Behemoth' model due to poor performance. Despite claims of commitment to open-source, this move could challenge Zuckerberg's vision, impact AI competition, and disadvantage smaller firms reliant on open models, including China's AI strategy.....

Jul 15, 2025

Meta's Open-Source Strategy Now in Question? Report Says Senior Leaders Discuss Abandoning Behemoth Model in Favor of Closed Development

Meta may shift from open-source to closed-source AI strategy, potentially shelving its next-gen model Behemoth due to performance issues. This strategic pivot, if approved, could reshape the global AI landscape and impact startups.....

Jul 15, 2025

MiniMax Valued Over 4 Billion USD, Backed by Shanghai State Capital, Joins the 3 Billion USD Large Model Club

Chinese AI firm MiniMax raised $300M, reaching a $4B valuation. Backed by Shanghai state capital, it's now one of China's two $3B+ LLM companies. Founded by ex-SenseTime executives, with prior investments from Alibaba and Tencent, it's reportedly preparing for a Hong Kong IPO.....

Jul 15, 2025

120

Google Gemini Embedding Model Tops MTEB Ranking, Surpassing OpenAI

Google released Gemini, the top embedding model with 68.37 MTEB score, surpassing OpenAI. Based on Transformer, it supports multilingual tasks at $0.15/M tokens, boosting AI applications like search.....

Jul 15, 2025

Silicon Base Flow Launches Powerful Coding Model Kimi K2 to Promote Smart Application Development

The Silicon Base Flow platform has launched the open-source MoE model Kimi K2 developed by Moonshot AI. The model has a total of 1T parameters and 32B activated parameters, supports a context length of 128K, and performs excellently in coding and agent tasks. The pricing is 4 yuan per million tokens for input and 16 yuan per million tokens for output. New users can get 14 yuan in trial credit upon registration. The model has three technical advantages: 15.5T tokens of large-scale training, MuonClip optimizer for stable expansion, and design optimized for agent tasks. Tests show that it excels in coding

Jul 14, 2025

170

A Daily: Moonlight Open-Sources Large Model Kimi K2; Zhiyuan Fully Open-Sources RoboBrain 2.0; Tongyi Qianwen Launches Qwen Chat Desktop Client

Moon's dark side opens trillion-parameter Kimi K2 model; RoboBrain2.0 enhances robot cognition; Alibaba's Qwen adds image generation; IndexTTS2 revolutionizes voice cloning; HuggingFace's Reachy Mini sells well; Meta enables real-time video generation; PixVerse adds multi-keyframe; Tesla Grok supports AMD only; OpenAI delays open-source release; Liquid AI's LFM2 boosts edge AI; AI 'time travel' trend goes viral.....

Jul 14, 2025

110

Product Finder

Product Submit

AI Models Finder

MCP Servers

MCP Client

MCP Inspector

Case Tutorials

Latest AI News

AI Daily Brief

ByteDance's Doubao Model Supports Real-time Voice Calls: Can Be Interrupted Anytime with Instant Responses

AIbase基地

This article is from AIbase Daily

AI News Recommendations

AI Daily: Meitu Launches Imaging AI Agent RoboNeo; 1.8bit Quantized Kimi K2 Model Released; Amazon Introduces AI Code Editor Kiro

Unsloth AI Releases 1.8-bit Quantized Kimi K2 Model, Significantly Reducing Deployment Costs

UTCP Makes a Strong Entry! Revolutionizing MCP AI Tool Calls into a New Era of Zero Packaging

xAI Launches New Features! Grok Web Version Voice Mode Opens, Challenging ChatGPT to New Heights

Meta May Abandon the Open-Source Philosophy and Shift to Proprietary AI Model Development

Meta's Open-Source Strategy Now in Question? Report Says Senior Leaders Discuss Abandoning Behemoth Model in Favor of Closed Development

MiniMax Valued Over 4 Billion USD, Backed by Shanghai State Capital, Joins the 3 Billion USD Large Model Club

Google Gemini Embedding Model Tops MTEB Ranking, Surpassing OpenAI

Silicon Base Flow Launches Powerful Coding Model Kimi K2 to Promote Smart Application Development

A Daily: Moonlight Open-Sources Large Model Kimi K2; Zhiyuan Fully Open-Sources RoboBrain 2.0; Tongyi Qianwen Launches Qwen Chat Desktop Client