SenseVoiceSmall

Multi-language high-precision speech recognition model

CommonProductProductivitySpeech recognitionEmotion analysis

SenseVoiceSmall is a speech foundation model that supports multiple speech understanding capabilities, including automatic speech recognition (ASR), spoken language recognition (LID), speech emotion recognition (SER), and audio event detection (AED). After training for more than 400,000 hours on data, the model supports more than 50 languages and has a recognition performance that surpasses the Whisper model. The SenseVoiceSmall model, which is a small model, uses a non-autoregressive end-to-end framework with extremely low inference latency and handles a 10-second audio in only 70 milliseconds, which is 15 times faster than Whisper-Large. In addition, SenseVoice also provides convenient fine-tuning scripts and strategies, supports multi-concurrency request service deployment pipelines, and the client languages include Python, C++, HTML, Java, and C#.

Latest AI News

AI Daily Brief

AI Product Finder

AI Product Rankings

AI Product Submit

AI Tools Directory

GEO Brand Visibility

AI Visibility Audit

AI Search Visibility Checker

GEO Promotion Link Detection

GEO Ranking Optimization System

GEO Services​

MCP Servers

MCP Client

MCP Case Tutorials

MCP Ranking

MCP Service Submission

MCP Playground

MCP Inspector

LLM API Hub

AI Models Finder

Model Providers

LLM Leaderboard

Compare LLMs

LLM Cost Calculator

LLM Arena

AI Model Compatibility Checker

AI Deployment Calculator

SenseVoiceSmall

SenseVoiceSmall Visit Over Time

SenseVoiceSmall Visit Trend

SenseVoiceSmall Visit Geography

SenseVoiceSmall Traffic Sources

SenseVoiceSmall Alternatives

SenseVoiceSmall — Multi-language high-precision speech recognition model

Tencent Cloud Speech Recognition ASR — Convert speech to text with support for real-time speech recognition, recording file recognition, and more.

Face Recognition, Liveness Detection, ID Document Recognition SDK — MiniAiLive offers top-ranked face recognition solutions from NIST FRVT, iBeta 2 certified liveness detection, and ID document recognition solutions.

Audio Chat — Upload audio files for easy dialogue analysis.

Vocapia — Professional speech recognition software and services

AI Audio Kit — AI Audio Tool - Effortlessly Transcribe Audio

speech-to-speech — Open-source speech-to-speech conversion module

Kimi-Audio — Kimi-Audio is an open-source audio foundation model that excels in audio understanding and generation.

R1-Omni — R1-Omni is a full-modality emotion recognition model incorporating reinforcement learning, focusing on improving the interpretability of multimodal emotion recognition.

CrisperWhisper — Word-level automatic speech recognition model

Luxand.cloud — Facial Search | Free Face Recognition API

Hailuo AI Audio — Hailuo AI Audio is an audio synthesis tool designed to create realistic speech.

SenseVoice — Multilingual speech understanding model providing high-precision speech recognition and sentiment analysis.

Whisper — General-purpose Speech Recognition Model

whisper-diarization — Automatic speech recognition and speaker segmentation based on OpenAI Whisper

TTSLabs — Online Voice Synthesis and Speech Recognition Service

Cartesia Voice Changer — Audio modulation technology that transforms voice while preserving original expression and emotion.

whisper-ner-v1 — An advanced model for joint speech transcription and entity recognition.

Speech Studio — Enables applications to listen, understand, and even converse with customers through functionalities like speech-to-text and text-to-speech.

Whisper large-v3-turbo — Efficient automatic speech recognition model

Pet-Knowing — Pet intelligent recognition powered by AI technology.

sherpa-onnx — Open-source project supporting various speech recognition and speech synthesis functionalities

Facia — Fast face recognition and 3D liveness detection

Easy Voice Toolkit — A locally-deployed AI voice toolkit supporting speech recognition, transcription, and conversion.

Scribba AI — AI-Powered Speech Recognition and Subtitling

Universal-2 — Next-generation speech AI offering superior audio data processing capabilities.

SenseVoiceSmall

SenseVoiceSmall Visit Over Time

SenseVoiceSmall Visit Trend

SenseVoiceSmall Visit Geography

SenseVoiceSmall Traffic Sources

SenseVoiceSmall Alternatives

SenseVoiceSmall — Multi-language high-precision speech recognition model

Tencent Cloud Speech Recognition ASR — Convert speech to text with support for real-time speech recognition, recording file recognition, and more.

Face Recognition, Liveness Detection, ID Document Recognition SDK — MiniAiLive offers top-ranked face recognition solutions from NIST FRVT, iBeta 2 certified liveness detection, and ID document recognition solutions.

Audio Chat — Upload audio files for easy dialogue analysis.

Vocapia — Professional speech recognition software and services

AI Audio Kit — AI Audio Tool - Effortlessly Transcribe Audio

speech-to-speech — Open-source speech-to-speech conversion module

Kimi-Audio — Kimi-Audio is an open-source audio foundation model that excels in audio understanding and generation.

R1-Omni — R1-Omni is a full-modality emotion recognition model incorporating reinforcement learning, focusing on improving the interpretability of multimodal emotion recognition.

CrisperWhisper — Word-level automatic speech recognition model

Luxand.cloud — Facial Search | Free Face Recognition API

Hailuo AI Audio — Hailuo AI Audio is an audio synthesis tool designed to create realistic speech.

SenseVoice — Multilingual speech understanding model providing high-precision speech recognition and sentiment analysis.

Whisper — General-purpose Speech Recognition Model

GEO Services