Product Finder

Product Submit

AI Models Finder

MCP Servers

MCP Client

MCP Inspector

Case Tutorials

Latest AI News

AI Daily Brief

hertz-dev

An open-source full-duplex audio generation foundational model.

CommonProductProgrammingAudio ProcessingSpeech Recognition

Visit

Hertz-dev is a full-duplex, audio-only transformer foundational model open-sourced by Standard Intelligence, featuring 8.5 billion parameters. This model represents scalable cross-modal learning technology capable of converting mono 16kHz speech into an 8Hz latent representation at a bitrate of 1kbps, outperforming other audio encoders. Key advantages of hertz-dev include low latency, high efficiency, and accessibility for researchers to fine-tune and build upon. Contextual information indicates that Standard Intelligence is committed to developing general intelligence that benefits humanity, with hertz-dev being a substantial step in that direction.

Visit

hertz-dev Visit Over Time

Monthly Visits

2503

Bounce Rate

52.96%

Page per Visit

1.6

Visit Duration

00:00:34

hertz-dev Visit Trend

hertz-dev Visit Geography

Product Finder

Product Submit

AI Models Finder

MCP Servers

MCP Client

MCP Inspector

Case Tutorials

Latest AI News

AI Daily Brief

hertz-dev

hertz-dev Visit Over Time

hertz-dev Visit Trend

hertz-dev Visit Geography

hertz-dev Traffic Sources

hertz-dev Alternatives

Tencent Cloud Speech Recognition ASR — Convert speech to text with support for real-time speech recognition, recording file recognition, and more.

speech-to-speech — Open-source speech-to-speech conversion module

Speech Studio — Enables applications to listen, understand, and even converse with customers through functionalities like speech-to-text and text-to-speech.

Universal-2 — Next-generation speech AI offering superior audio data processing capabilities.

Vocapia — Professional speech recognition software and services

Whisper Speech — Open-source text-to-speech system

Whisper — General-purpose Speech Recognition Model

Speech to Note — Transforming speech into powerful content

Fish Audio Text to Speech — Converts text into natural and fluent speech output

TTSLabs — Online Voice Synthesis and Speech Recognition Service

Unreal Speech — Reduces the cost of text-to-speech by up to 95%

sherpa-onnx — Open-source project supporting various speech recognition and speech synthesis functionalities

Summify - Summarize Speech — Easily record and summarize speech content

Fish Speech — A voice synthesis tool that offers high-quality speech generation services.

Kimi-Audio — Kimi-Audio is an open-source audio foundation model that excels in audio understanding and generation.

Hailuo AI Audio — Hailuo AI Audio is an audio synthesis tool designed to create realistic speech.

SpeechFlow - Advanced Speech-to-Text API — Powerful Speech-to-Text API

Whisper large-v3-turbo — Efficient automatic speech recognition model

Easy Voice Toolkit — A locally-deployed AI voice toolkit supporting speech recognition, transcription, and conversion.

Azure Cognitive Services Speech — Enables applications to interact intelligently through the conversion of speech to text and vice versa.

Fish Agent V0.1 3B — High-precision speech-to-speech model for capturing and generating environmental audio information.

whisper-ner-v1 — An advanced model for joint speech transcription and entity recognition.

Fish Speech V1.4 — Multilingual text-to-speech conversion model

CosyVoice Speech Generation Model 2.0-0.5B — Efficient, multilingual speech synthesis model

DuRT — DuRT is a real-time speech recognition and translation software for macOS, dedicated to providing efficient and accurate speech processing services.

Scribba AI — AI-Powered Speech Recognition and Subtitling

Moonshine Web — Real-time browser-based speech recognition application

SpeechFlow — Powerful speech-to-text API

SenseVoice — Multilingual speech understanding model providing high-precision speech recognition and sentiment analysis.

SenseVoiceSmall — Multi-language high-precision speech recognition model