Shanghai Jiao Tong University Partners with ByteDance to Launch LSLM: AI Voice Model Achieving Breakthrough in 'Listen and Speak'

AIbase基地

Published inAI News · 2 min read · Aug 6, 2024

388

Recently, the LANCE Lab at Shanghai Jiao Tong University and ByteDance have jointly introduced a new interactive speech model named LSLM. This model is said to excel in simultaneous listening and speaking, delivering a conversational experience that closely mimics human natural dialogue.

LSLM, nicknamed "Little L," addresses the limitations of existing speech models in real-time interaction, noise resistance, and recognition of unknown speakers, bringing it closer to human-like natural dialogue. It features an end-to-end design with both auditory and vocal channels, utilizes a decoder-only TTS for speech generation, and employs a streaming self-supervised learning (SSL) encoder to process audio inputs in real-time.

"Little L" boasts unique features: full-duplex modeling (FDM), enabling simultaneous listening and speaking, allowing interruptions and alternations in conversations; strong noise resistance, maintaining stability in noisy environments and adapting to various real-world scenarios; and sensitivity to unknown speakers, capable of identifying and responding to new voices and commands, accommodating different users.

Project details: https://ziyang.tech/LSLM/

Paper: https://arxiv.org/abs/2408.02622

LANCE Laboratory ByteDance LSLM Interactive Voice Model

This article is from AIbase Daily

Welcome to the [AI Daily] column! This is your daily guide to exploring the world of artificial intelligence. Every day, we present you with hot topics in the AI field, focusing on developers, helping you understand technical trends, and learning about innovative AI product applications.

—— Created by the AIbase Daily Team

AI News Recommendations

ByteDance Launches Vidi, a Multimodal Model Leading the Trend in Ultra-Long Video Understanding and Editing

Apr 23, 2025

460

ByteDance Releases Efficient Pre-training Length Scaling Technology, Breaking Through Long Sequence Training Bottlenecks

Apr 23, 2025

240

ByteDance Restructures AI Product Line: Cat Box Leadership Change, Xinghui Merged into Doubao, Focusing on Growth

According to LatePost, ByteDance recently made significant adjustments to its AI product department, Flow. The social companionship AI product, Cat Box, has a new leader. The previous head, Liang Chenqi, has left the company, and has been replaced by Xi Yuan (codename), the former head of Xinghui. Meanwhile, the Xinghui team, which develops AI camera and image generation applications, is slated to merge into the Doubao App, under the unified management of Doubao App's head, Lu You (codename). The Flow department is headed by Zhu Jun and includes Doubao, Cat Box, Xinghui, Doubao Aixue, and G.

Apr 23, 2025

190

ByteDance Research Open-Sources ChatTS-14B: Native Understanding and Reasoning Over Time

ByteDance Research has announced the open-sourcing of ChatTS-14B, a 14-billion parameter large language model (LLM) specifically designed for understanding and reasoning with time series data. Released under the Apache2.0 license, ChatTS-14B's open-source release has garnered significant attention within the AI community, marking a substantial advancement in the intersection of time series analysis and generative AI. ChatTS-14B: An Intelligent Conversational Engine for Time Series. ChatTS-14B is based on Qwen2.5-1...

Apr 21, 2025

1.1k

Coze Space Officially Opens Beta Testing, Supporting MCP Extension Integration

ByteDance's technology team announced that its new AI collaborative workspace, "Coze Space", is officially opening beta testing. Coze Space aims to be the optimal place for users to collaborate with AI Agents, providing comprehensive services ranging from answering questions to solving problems, helping users work more efficiently.

Apr 19, 2025

1.0k

BMW Brilliance and ByteDance's Volcano Engine Partner to Drive AI-Powered Automotive Marketing

Recently, BMW Brilliance Lynk & Co Digital Information Technology Co., Ltd. (Lynk & Co) and ByteDance's Volcano Engine have partnered to innovate automotive marketing services with the help of Artificial Intelligence (AI) technology. This collaboration leverages AI to achieve precise product matching and purchase recommendations, optimize content guidance, and enhance the user car-buying experience and dealer operational efficiency. BMW Group President and CEO in Greater China, Gao Xiang, stated that AI is key to BMW's creation of smarter and more considerate mobility solutions, and is being rapidly integrated into R&D, production, supply chain, product, service, and operations.

Apr 18, 2025

260

ByteDance Releases UI-TARS-1.5: Open-Source Multimodal Agent Leading a New Wave in GUI Automation

ByteDance has officially released UI-TARS-1.5 on the Hugging Face platform, an open-source multimodal agent built upon a powerful vision-language model. This release marks another significant breakthrough for ByteDance in the field of AI automated interaction, providing developers and users with a highly efficient and intelligent cross-platform GUI (Graphical User Interface) automation solution. UI-TARS-1.5: A New Benchmark for Multimodal Agents. UI-TARS-1.5 is the latest in ByteDance's UI-TARS series...

Apr 18, 2025

1.2k

ByteDance Doubao Open-Source Seed Agent Model UI-TARS-1.5

The ByteDance Doubao large model team announced the open-sourcing of UI-TARS-1.5, an open-source multimodal agent built on a vision-language model capable of efficiently executing various tasks in a virtual world. The model achieved state-of-the-art (SOTA) performance on seven typical GUI (Graphical User Interface) benchmark evaluations and demonstrated, for the first time, its long-term reasoning capabilities in games and interactive capabilities in open spaces. This open-source project marks a significant advancement in multimodal agent technology for GUIs.

Apr 18, 2025

1.0k

AI Daily: ByteDance Releases Doubao 1.5 Deep Thinking Model; WeChat Launches Yuanbao, its First AI Assistant; OpenAI Releases o4-mini and a Full-Blooded o3

Welcome to the 【AI Daily】column! Your daily guide to exploring the world of artificial intelligence. We present you with the hottest AI news, focusing on developers and helping you understand technology trends and innovative AI product applications. Discover new AI products here: https://top.aibase.com/1、OpenAI released two multimodal reasoning models, o4-mini and a full-blooded o3. OpenAI showcased its latest multimodal models, o4-mini and a full-blooded o3, during a technical livestream.

Apr 17, 2025

790

ByteDance Releases Doubao 1.5 Deep Thinking Model: Multimodal Deep Thinking, Low Latency

Apr 17, 2025

580

AI News

AI Daily

AI Timeline

Al Hardware

Latest Cases

Image Collection

Video Collection

Audio Collection

Content Collection

Latest Tutorials

AI Product Ranking

AI Traffic Growth Ranking

AI Traffic Decline Ranking

AI Weekly Ranking

United States

China

India

Brazil

Image Generation

Personal Assistant

Character Generation

Video Generation

AI Project Ranking

AI Project Growth Ranking

AI Developer Ranking

AI Organization Ranking

Deepseek

TTS

LLM

ChatGPT

Overview

Shanghai Jiao Tong University Partners with ByteDance to Launch LSLM: AI Voice Model Achieving Breakthrough in 'Listen and Speak'

AIbase基地

This article is from AIbase Daily

AI News Recommendations

ByteDance Launches Vidi, a Multimodal Model Leading the Trend in Ultra-Long Video Understanding and Editing

ByteDance Releases Efficient Pre-training Length Scaling Technology, Breaking Through Long Sequence Training Bottlenecks

ByteDance Restructures AI Product Line: Cat Box Leadership Change, Xinghui Merged into Doubao, Focusing on Growth

ByteDance Research Open-Sources ChatTS-14B: Native Understanding and Reasoning Over Time

Coze Space Officially Opens Beta Testing, Supporting MCP Extension Integration

BMW Brilliance and ByteDance's Volcano Engine Partner to Drive AI-Powered Automotive Marketing

ByteDance Releases UI-TARS-1.5: Open-Source Multimodal Agent Leading a New Wave in GUI Automation

ByteDance Doubao Open-Source Seed Agent Model UI-TARS-1.5

AI Daily: ByteDance Releases Doubao 1.5 Deep Thinking Model; WeChat Launches Yuanbao, its First AI Assistant; OpenAI Releases o4-mini and a Full-Blooded o3

ByteDance Releases Doubao 1.5 Deep Thinking Model: Multimodal Deep Thinking, Low Latency