whisper-diarization

Automatic speech recognition and speaker segmentation based on OpenAI Whisper

CommonProductProgrammingSpeech RecognitionSpeaker Segmentation

whisper-diarization is an open-source project that integrates Whisper's automatic speech recognition (ASR) capabilities, Voice Activity Detection (VAD), and speaker embedding technology. It improves the accuracy of speaker embeddings by extracting the audible portions of audio, generating transcriptions using Whisper, and correcting timestamps and alignment through WhisperX to minimize segmentation errors caused by temporal offsets. Subsequently, MarbleNet is employed for VAD and segmentation to eliminate silence, while TitaNet is used to extract speaker embeddings for identifying speakers in each segment. Finally, the results are correlated with the timestamps generated by WhisperX, determining the speaker of each word based on timestamps and realigning with a punctuation model to compensate for minor timing offsets.

Visit

whisper-diarization Visit Over Time

Monthly Visits

493360068

Bounce Rate

36.08%

Page per Visit

6.1

Visit Duration

00:06:29

whisper-diarization Visit Trend

whisper-diarization Visit Geography

whisper-diarization Traffic Sources

whisper-diarization Alternatives

whisper-diarization — Automatic speech recognition and speaker segmentation based on OpenAI Whisper

Programming

•Speech Recognition•Speaker Segmentation

708

Reverb — Open-source inference code for speech recognition and speaker segmentation models.

Programming

•Speech Recognition•Speaker Segmentation

444

BetterWhisperX — An automatic speech recognition tool providing word-level timestamps and speaker identification.

Programming

Latest AI News

AI Daily Brief

AI Product Finder

AI Product Rankings

AI Product Submit

AI Tools Directory

AI Models Finder

LLM Leaderboard

Model Providers

Compare LLMs

LLM Cost Calculator

LLM Arena

MCP Servers

MCP Client

MCP Case Tutorials

MCP Ranking

MCP Service Submission

MCP Playground

MCP Inspector

GEO Brand Visibility

AI Brand Monitoring Tool

AI Search Visibility Checker

GEO Promotion Link Detection

GEO Services​

AI Model Compatibility Checker

AI Deployment Calculator

whisper-diarization

whisper-diarization Visit Over Time

whisper-diarization Visit Trend

whisper-diarization Visit Geography

whisper-diarization Traffic Sources

whisper-diarization Alternatives

whisper-diarization — Automatic speech recognition and speaker segmentation based on OpenAI Whisper

Reverb — Open-source inference code for speech recognition and speaker segmentation models.

BetterWhisperX — An automatic speech recognition tool providing word-level timestamps and speaker identification.

CrisperWhisper — Word-level automatic speech recognition model

Tencent Cloud Speech Recognition ASR — Convert speech to text with support for real-time speech recognition, recording file recognition, and more.

Whisper large-v3-turbo — Efficient automatic speech recognition model

DiariZen — A toolkit for speaker segmentation.

whisper-ner-v1 — An advanced model for joint speech transcription and entity recognition.

Moonshine — Fast and accurate automatic speech recognition model for edge devices.

WhisperKit — Automatic Speech Recognition Model Compression & Optimization Tool

seed-tts-eval — A testing dataset for evaluating a model's zero-shot speech generation capability

Vocapia — Professional speech recognition software and services

Easy Voice Toolkit — A locally-deployed AI voice toolkit supporting speech recognition, transcription, and conversion.

Open-Vocabulary SAM — Interactive Segmentation and Recognition Model

parakeet-tdt-0.6b-v2 — A high-quality English automatic speech recognition model that supports punctuation and timestamp prediction.

Speech Studio — Enables applications to listen, understand, and even converse with customers through functionalities like speech-to-text and text-to-speech.

PengChengStarling — PengChengStarling is a multilingual automatic speech recognition (ASR) model development toolkit based on the icefall project.

TurboScribe — Unlimited audio and video transcription, supporting 98+ languages

speech-to-speech — Open-source speech-to-speech conversion module

Moonshine Web — Real-time browser-based speech recognition application

Speech to Text & Transcribe — Effortless audio transcription

Transcriptal — Free AI-powered automatic transcription tool

Malloy — Accurate AI video transcription

Whisper — General-purpose Speech Recognition Model

Free Subtitles AI — Free, automated transcription of audio and video into text.

WhisperNER — Unified open-source named entity and speech recognition model

TTSLabs — Online Voice Synthesis and Speech Recognition Service

EchoScribe — Smart Speech Transcription Tool

SpeechFlow - Advanced Speech-to-Text API — Powerful Speech-to-Text API

Inkr — Inkr transcription is a fast, accurate, and smooth audio and video transcription tool.

whisper-diarization

whisper-diarization Visit Over Time

whisper-diarization Visit Trend

whisper-diarization Visit Geography

whisper-diarization Traffic Sources

whisper-diarization Alternatives

whisper-diarization — Automatic speech recognition and speaker segmentation based on OpenAI Whisper

Reverb — Open-source inference code for speech recognition and speaker segmentation models.

BetterWhisperX — An automatic speech recognition tool providing word-level timestamps and speaker identification.

CrisperWhisper — Word-level automatic speech recognition model

Tencent Cloud Speech Recognition ASR — Convert speech to text with support for real-time speech recognition, recording file recognition, and more.

Whisper large-v3-turbo — Efficient automatic speech recognition model

DiariZen — A toolkit for speaker segmentation.

whisper-ner-v1 — An advanced model for joint speech transcription and entity recognition.

Moonshine — Fast and accurate automatic speech recognition model for edge devices.

WhisperKit — Automatic Speech Recognition Model Compression & Optimization Tool

seed-tts-eval — A testing dataset for evaluating a model's zero-shot speech generation capability

Vocapia — Professional speech recognition software and services

GEO Services