Information

Latest AI News

Explore AI Frontiers, Master Industry Trends

AI Daily Brief

Your Daily AI Brief - Never Miss What's Next

Information

AI Product Finder

Smart Product Discovery - Comprehensive Market Intelligence

AI Product Rankings

AI Product Power Rankings - Performance, Buzz & Trends

AI Product Submit

Submit Your AI Product - Amplify Reach & Drive Growth

Tools

AI Tools Directory

Discover The Best AI Websites & Tools

Information

AI Models Finder

Comprehensive AI Models Collection for All Your Development & Research Needs

LLM Leaderboard

AI LLM Power Rankings - Performance, Buzz & Trends

Model Providers

Discover Trusted AI Model Partners - Guaranteed Reliable Support

Submit Your Model

Submit Your Model Info & Services - Precision Marketing & User Targeting

Tools

Compare LLMs

Multi-Dimensional Large Model Comparison - Find Your Perfect Match

LLM Cost Calculator

Calculate AI Model Costs Accurately - Optimize Your Budget

LLM Arena

Multi-Model Real-Time Evaluation & Quick Output Comparison

Information

MCP Servers

Discover Popular AI-MCP Services - Find Your Perfect Match Instantly

MCP Client

Easy MCP Client Integration - Access Powerful AI Capabilities

MCP Case Tutorials

Master MCP Usage - From Beginner to Expert

MCP Ranking

Top MCP Service Performance Rankings - Find Your Best Choice

MCP Service Submission

Publish & Promote Your MCP Services

Tools

MCP Playground

Test MCP Services Freely - Quick Online Experience

MCP Inspector

Quick MCP Service Testing - Fast Deployment

AI Brand Monitoring Tool

Analyze & Track How AI Models Cite Your Brand

GEO Services

Achieve Dominant Visibility in AI Search for Your Business or Brand with GEO Services

AI Search Visibility Checker

Detect brand's visibility on AI platforms

Tools

AI Model Compatibility Checker

Free PC Hardware Test for DeepSeek & Llama

AI Deployment Calculator

Enter Your Large Model Computing Requirements for Instant GPU, Memory & Server Configuration Recommendations

AI Tutorial

Information

AI Dataset Collection

Large-scale datasets and benchmarks for training, evaluating, and testing models to measure

Tools

Intelligent Document Recognition

Comprehensive Text Extraction and Document Processing Solutions for Users

Meta's Latest Audio Model SPIRIT LM: Making AI Not Just Talk, But Also Express Emotion!

AIbase基地

Published inAI News · 6 min read · Nov 22, 2024

708

Meta AI has recently made a significant open-source release of a foundational multimodal language model called SPIRIT LM. This model can freely mix text and speech, opening up new possibilities for multimodal tasks involving audio and text.

SPIRIT LM is based on a pre-trained text language model with 7 billion parameters. It extends into the speech modality through continuous training on text and speech units. It can understand and generate text just like large text models, while also being capable of understanding and generating speech, even mixing text and speech together to create various amazing effects! For instance, you can use it for speech recognition to convert speech into text; you can also use it for speech synthesis to convert text into speech; and it can be used for speech classification to determine the emotion expressed in a piece of speech.

What's even more impressive is that SPIRIT LM is particularly skilled at "emotional expression"! It can recognize and generate various speech tones and styles, making the AI's voice sound more natural and emotive. You can imagine that the voice generated by SPIRIT LM is no longer a cold, robotic sound, but rather sounds like a real person speaking, full of emotions!

To enhance the AI's ability to "express emotions", Meta's researchers have developed two versions of SPIRIT LM:

"Base Version" (BASE): This version mainly focuses on the phonetic information of speech, which is the "basic composition" of speech.

"Expressive Version" (EXPRESSIVE): This version includes not only phonetic information but also tone and style information, allowing the AI's voice to be more vivid and expressive.

So, how does SPIRIT LM achieve all of this?

In simple terms, SPIRIT LM is trained based on Meta's previously released powerful text model—LLAMA2. Researchers fed a large amount of text and speech data to LLAMA2 and employed a special "interleaved training" method, enabling LLAMA2 to learn the patterns of both text and speech simultaneously.

To test SPIRIT LM's "emotional expression" capabilities, Meta's researchers designed a new testing benchmark called the "Speech-Text Emotion Preservation Benchmark" (STSP). This benchmark includes various speech and text prompts expressing different emotions, aimed at testing whether the AI model can accurately recognize and generate the corresponding emotional speech and text. The results show that the "Expressive Version" of SPIRIT LM performs excellently in emotion preservation, being the first AI model capable of cross-modal emotion retention!

Of course, Meta's researchers also admit that SPIRIT LM has many areas for improvement. For instance, SPIRIT LM currently only supports English, and it needs to expand to other languages in the future; the model size of SPIRIT LM is still not large enough, and it will need to continue to grow to enhance model performance.

SPIRIT LM is a significant breakthrough for Meta in the field of AI, opening the door to a world of "emotionally expressive" AI. We believe that in the near future, we will see more interesting applications developed based on SPIRIT LM, allowing AI not only to speak but also to express emotions like a real person, facilitating more natural and friendly interactions with us!

Project Address: https://speechbot.github.io/spiritlm/

Paper Address: https://arxiv.org/pdf/2402.05755

MetaAI SPIRITLM Multimodal Language Model Speech Synthesis

This article is from AIbase Daily

Welcome to the [AI Daily] column! This is your daily guide to exploring the world of artificial intelligence. Every day, we present you with hot topics in the AI field, focusing on developers, helping you understand technical trends, and learning about innovative AI product applications.

—— Created by the AIbase Daily Team

AI News Recommendations

Iceland and Anthropic Collaborate to Promote AI Education Program, Pioneering a New Global Education Model

Anthropic partners with Iceland to introduce Claude AI in schools nationwide, aiding teacher lesson preparation and instruction. This initiative spans urban and rural areas, exploring AI's educational applications as a potential global model.....

Nov 6, 2025

London High Court Rejects Getty's Case Against Stability AI: AI Model Training Does Not Constitute Copyright Infringement

UK High Court dismisses Getty Images' copyright lawsuit against Stability AI, setting a key precedent on using copyrighted images for AI training. The case continues.....

Nov 6, 2025

Apple Invests $1 Billion to Partner with Google! New Siri to Launch in Spring 2024, Powered by the Gemini Large Model for a Voice Assistant Revival

Apple pays Google $10B annually for Gemini AI, integrating Gemini 2.5 Pro into Siri by 2026 to enhance capabilities and regain voice assistant leadership.....

Nov 6, 2025

Anthropic Launches a New Code Execution Model Based on MCP to Improve AI Agent Efficiency

Anthropic's MCP-based 'code execution mode' optimizes AI Agent tool usage, overcoming performance bottlenecks in multi-tool scenarios by avoiding direct loading of tool definitions and intermediate results into model context.....

Nov 5, 2025

140

Global First Cross-Ontology Navigation Large Model NavFoM Released! Robots Can Recognize Routes Anywhere, the Era of Zero-Shot Navigation Has Arrived

NavFoM, the first cross-ontology panoramic navigation foundation model, enables zero-shot, map-free navigation across diverse environments like malls, overcoming traditional robot localization limits.....

Nov 5, 2025

110

NetEase Cloud Music Officially Launches AI Audio Tuning Master Large Model

NetEase Cloud Music launches 'AI Mastering' feature for personalized audio optimization using AI to analyze songs and adjust parameters in real-time.....

Nov 5, 2025

110

Global First Cross-Entity All-Around Navigation Large Model NavFoM Released

NavFoM, the first cross-ontology panoramic navigation base model, integrates vision-language navigation, goal-oriented navigation, visual tracking, and autonomous driving into a unified framework for indoor and outdoor applications.....

Nov 5, 2025

120

Meituan's All-Round Cat Makes a Grand Debut! LongCat-Flash-Omni Multimodal Large Model Opens Source and Tops the Charts Immediately, with Real-Time Interaction That Is Extraordinarily Fast

Meituan's open-source multimodal large model, LongCat-Flash-Omni, achieves a technological breakthrough, surpassing closed-source competitors in multiple benchmark tests, reaching industry-leading levels. The model supports real-time integration processing of text, speech, images, and video, with near-zero latency in interaction, pushing locally developed multimodal AI applications to a new level.

Nov 5, 2025

Shanghai Bank Launches Its First Hu-Shang Language Interactive AI Application to Support Smart Elderly Financial Services

Enterprises such as Shanghai Caiyue Star and Shanghai Bank signed a strategic cooperation agreement, launching the country's first complete Hu-Shang language interactive AI application, supporting elderly financial services and dialect intelligent system construction, providing more convenient financial services for elderly people who are accustomed to dialects.

Nov 5, 2025

Llama.cpp Has Evolved Completely! The Era of Local AI Has a Multimodal Revolution, Ollama May Be Outclassed

Llama.cpp evolves from a C++ engine to a full AI workbench with a modern web UI, supporting multimodal input, structured output, and parallel interactions, making it user-friendly for all.....

Nov 5, 2025

110

Latest AI News

AI Daily Brief

AI Product Finder

AI Product Rankings

AI Product Submit

AI Tools Directory

AI Models Finder

LLM Leaderboard

Model Providers

Submit Your Model

Compare LLMs

LLM Cost Calculator

LLM Arena

MCP Servers

MCP Client

MCP Case Tutorials

MCP Ranking

MCP Service Submission

MCP Playground

MCP Inspector

AI Brand Monitoring Tool

GEO Services​

AI Search Visibility Checker

AI Model Compatibility Checker

AI Deployment Calculator

AI Dataset Collection

Intelligent Document Recognition

Meta's Latest Audio Model SPIRIT LM: Making AI Not Just Talk, But Also Express Emotion!

AIbase基地

This article is from AIbase Daily

AI News Recommendations

Iceland and Anthropic Collaborate to Promote AI Education Program, Pioneering a New Global Education Model

London High Court Rejects Getty's Case Against Stability AI: AI Model Training Does Not Constitute Copyright Infringement

Apple Invests $1 Billion to Partner with Google! New Siri to Launch in Spring 2024, Powered by the Gemini Large Model for a Voice Assistant Revival

Anthropic Launches a New Code Execution Model Based on MCP to Improve AI Agent Efficiency

Global First Cross-Ontology Navigation Large Model NavFoM Released! Robots Can Recognize Routes Anywhere, the Era of Zero-Shot Navigation Has Arrived

NetEase Cloud Music Officially Launches AI Audio Tuning Master Large Model

Global First Cross-Entity All-Around Navigation Large Model NavFoM Released

Meituan's All-Round Cat Makes a Grand Debut! LongCat-Flash-Omni Multimodal Large Model Opens Source and Tops the Charts Immediately, with Real-Time Interaction That Is Extraordinarily Fast

Shanghai Bank Launches Its First Hu-Shang Language Interactive AI Application to Support Smart Elderly Financial Services

Llama.cpp Has Evolved Completely! The Era of Local AI Has a Multimodal Revolution, Ollama May Be Outclassed

GEO Services