AI News

Don't miss any moment of global AI innovation

AI Daily

Daily three-minute AI industry trends

AI Timeline

AI industry milestones

Al Hardware

Lists all AI hardware products.

AI Monetization Guide

Latest Cases

AI monetization case sharing

Image Collection

AI image creation monetization cases

Video Collection

AI video creation monetization cases

Audio Collection

AI audio creation monetization cases

Content Collection

AI content writing monetization cases

AI Tutorials

Latest Tutorials

Free sharing of the latest AI tutorials

AI Product Rankings

AI Product Ranking

Shows total visits ranking of AI websites

AI Traffic Growth Ranking

Track fastest growing AI websites by traffic

AI Traffic Decline Ranking

Focus on AI websites with significant traffic drops

AI Weekly Ranking

Shows weekly visits ranking of AI websites

Popular Country Rankings

United States

AI websites most popular with US users

China

AI websites most popular with Chinese users

India

AI websites most popular with Indian users

Brazil

AI websites most popular with Brazilian users

Popular Category Rankings

Image Generation

Total visits ranking of AI image generation websites

Personal Assistant

Total visits ranking of AI personal assistant websites

Character Generation

Total visits ranking of AI character generation websites

Video Generation

Total visits ranking of AI video generation websites

Popular Open Source Data Rankings

AI Project Ranking

GitHub popular AI projects by total stars

AI Project Growth Ranking

GitHub popular AI projects by growth rate

AI Developer Ranking

GitHub popular AI developer ranking

AI Organization Ranking

GitHub popular AI organization ranking

Popular Open Source Categories

Deepseek

GitHub popular deepseek open source projects

TTS

GitHub popular TTS open source projects

LLM

GitHub popular LLM open source projects

ChatGPT

GitHub popular ChatGPT open source projects

AI Open Source Project Library

Overview

Overview of GitHub popular AI open source projects

Product Library Tool Navigation MCP

Meta's Latest Black Technology SPIRIT-LM: An AI Language Model That Can Speak, Write, and Understand Your Emotions – This Model is Quite Impressive!

AIbase基地

Published inAI News · 4 min read · Oct 21, 2024

300

Meta AI's latest offering, SPIRIT-LM, is a groundbreaking multimodal foundational language model that can freely mix text and speech and understand and express emotions like humans.

SPIRIT-LM is built on a pre-trained text language model and extends to the speech modality by continuous training on text and speech units. The model connects speech and text sequences into a single token set and uses a small, self-managed speech-text parallel corpus with a word-level interleaving method for training.

SPIRIT-LM comes in two versions:

The Base version (SPIRIT-LM-BASE) uses speech semantic units.

The Expressive version (SPIRIT-LM-EXPRESSIVE) uses pitch and style units to simulate emotional expression, in addition to semantic units.

Both versions use subword BPE tokens to encode text.

SPIRIT-LM combines the semantic capabilities of text models with the expressive power of speech models, enabling it to perform cross-modal tasks such as speech recognition, text-to-speech conversion, and speech classification with only a few samples needed to learn new tasks.

To evaluate the expressive capabilities of the generative model, researchers introduced the Speech-Text Sentiment Preservation Benchmark (STSP), which measures the degree to which the generative model preserves emotions in both intra-modal and cross-modal scenarios for oral and written expressions.

The Expressive version of SPIRIT-LM is the first language model capable of preserving emotions from text and speech cues in both intra-modal and cross-modal scenarios. It utilizes pitch and style tokens to capture the emotional and stylistic aspects of speech and is evaluated through a specially designed Speech-Text Sentiment Preservation Benchmark.

Research findings indicate:

SPIRIT-LM is on par with existing models in understanding vocabulary, grammar, and semantics in the speech modality while maintaining strong text generation capabilities.

Interleaved training is key to SPIRIT-LM's success, allowing the model to learn the correspondence between speech and text tokens, thereby achieving better text-to-speech conversion.

Pre-training knowledge is crucial for SPIRIT-LM's few-shot learning ability.

SPIRIT-LM-EXPRESSIVE can capture and generate more expressive speech, outperforming the Base version in emotional expression.

SPIRIT-LM marks a significant milestone in the history of AI language models, opening up new possibilities for multimodal language understanding and generation, and laying the foundation for future smarter and more human-like AI applications.

Paper link: https://arxiv.org/pdf/2402.05755

Project link: https://github.com/facebookresearch/spiritlm

SPIRIT-LM Multi-ModelBasedLanguageModel AutonomousManagement EmotionalExpression

This article is from AIbase Daily

Welcome to the [AI Daily] column! This is your daily guide to exploring the world of artificial intelligence. Every day, we present you with hot topics in the AI field, focusing on developers, helping you understand technical trends, and learning about innovative AI product applications.

—— Created by the AIbase Daily Team

AI News Recommendations

Alibaba Ovis-U1 Launches with a Bang: A Multi-Modal AI All-in-One, Open Source Empowers Global Developers

On June 29, 2025, the Alibaba International AI Team officially released the new multi-modal large model **Ovis-U1**, marking another major breakthrough in the field of multi-modal artificial intelligence. As the latest masterpiece of the Ovis series, Ovis-U1 integrates multi-modal understanding, image generation, and image editing functions, demonstrating powerful cross-modal processing capabilities, providing new possibilities for developers, researchers, and industry applications. This is a detailed report on Ovis-U1 by AIbase. Ovis-U1

Jun 30, 2025

1.1k

Giant Network's 'Space Kill' Launches AI-Native Endgame Duels: Three Domestic Large Models Participate, Creating Multi-Dimensional Intelligent Competition

Jun 27, 2025

AI Daily: Jiameng Gray Test Image 3.1 Model; ElevenLabs Launches AI Voice Assistant 11ai; Baidu Releases Multi-Agent Collaborative AI IDE

Welcome to the [AI Daily] column! This is your guide to exploring the world of artificial intelligence every day. Every day, we present you with the latest content in the AI field, focusing on developers, helping you understand technical trends and innovative AI product applications. Click to learn more about fresh AI products: https://top.aibase.com/1. Detail-Oriented! Jiameng Gray Test Image 3.1 Model Enhances Cinematic Feel and Stylized Artistic Feel. As a detail-oriented person, I am very excited about the Jiameng Gray Test image 3.1 model. Compared to the 3.0 version, 3.

Jun 24, 2025

310

AI Daily: Ji Meng Closed Beta Image Model 3.1; ElevenLabs Launches AI Voice Assistant 11ai; Baidu Releases Multi-Agent Collaborative AI IDE

Jun 24, 2025

Wenxin Kaima Launches Multimodal, Multi-Agent Collaborative AI IDE Comate AI IDE

Recently, at Baidu's AI Open Day, Wenxin Kaima, Baidu's intelligent code assistant, achieved a major breakthrough. Its standalone AI-native development environment tool, Comate AI IDE, was officially launched. This industry's first multimodal, multi-agent collaborative AI IDE not only introduces the groundbreaking feature of converting design drafts into code with one click, but also provides efficient, intelligent, and secure programming experiences for domestic enterprises and developers.

Jun 23, 2025

1.7k

Disrupting Tradition! New Multi-Agent Framework OWL Gains 17K Stars, Surpassing OpenAI to Pioneer a New Era of Intelligent Collaboration

Jun 17, 2025

240

In-Depth Review of SchedPilot: Is This Social Media Management Tool Worth Trying?

This article provides a detailed review of the AI-driven social media management tool SchedPilot, analyzing its core functions such as cross-platform publishing and AI comment assistant, helping creators and marketing teams determine whether this tool is worth using. Understand the differentiated advantages and applicable scenarios of SchedPilot compared to other social media management tools.

Jun 17, 2025

2.7k

Ant Group and inclusionAI Jointly Launch Ming-Omni: The First Open Source Multi-modal GPT-4o

Jun 16, 2025

47.2k

MiniMax Agent上线！Image Generation + Multi-language Support Smarter Long-task Processing

Jun 13, 2025

580

Alibaba Open-Sources MaskSearch! AI Learns to Actively Search and Perform Multi-Step Reasoning for Accurate Solutions to Complex Problems

Jun 11, 2025

410