Kokoro-TTS: A Small Text-to-Speech Model that Once Ranked First on TTS Leaderboards

AIbase基地

Published inAI News · 4 min read · Jan 15, 2025

1.6k

With the rapid development of artificial intelligence, speech synthesis technology is gaining increasing attention. Recently, the latest speech synthesis model named Kokoro was officially released on the Hugging Face platform. This model features 82 million parameters, marking an important milestone in the field of speech synthesis.

Kokoro v0.19 ranked first on the TTS (Text-to-Speech) leaderboard in the weeks leading up to its release, outperforming other models with more parameters. This model achieved results comparable to models like XTTS v2 with 467M parameters and MetaVoice with 1.2B parameters, using less than 100 hours of audio data in a monophonic setup. This achievement indicates that the relationship between the performance of traditional speech synthesis models and their parameters, computational load, and data volume may be more significant than previously expected.

For usage, users only need to run a few lines of code in Google Colab to load the model and voice packages, generating high-quality audio. Currently, Kokoro supports both American English and British English, offering multiple voice packages for users to choose from.

The training process for Kokoro utilized Vast.ai's A100 80GB vRAM instances, which are relatively low-cost to rent, ensuring an efficient training process. The entire model was trained with less than 20 training epochs and under 100 hours of audio data. The Kokoro model was trained using public domain audio data and other open-licensed audio to ensure data compliance.

Despite Kokoro's outstanding performance in speech synthesis, it currently does not support voice cloning due to limitations in its training data and architecture. The main training data is focused on long-form reading and narration rather than dialogue.

Model: https://huggingface.co/hexgrad/Kokoro-82M

Experience: https://huggingface.co/spaces/hexgrad/Kokoro-TTS

Key Highlights:

🌟 Kokoro-82M is a newly released speech synthesis model with 82 million parameters, supporting various voice packages.

🎤 The model excels in the TTS field, having ranked first on the leaderboard and trained with less than 100 hours of audio data.

📊 The training of the Kokoro model utilized open-licensed data to ensure compliance, although some functional limitations still exist.

SpeechSynthesis Kokoro HuggingFace AITechnology

This article is from AIbase Daily

Welcome to the [AI Daily] column! This is your daily guide to exploring the world of artificial intelligence. Every day, we present you with hot topics in the AI field, focusing on developers, helping you understand technical trends, and learning about innovative AI product applications.

—— Created by the AIbase Daily Team

AI News Recommendations

UIUC and Google Release Search-R1: A Large Language Model That Can Search and Answer Questions

A groundbreaking new AI technology allows language models to search the internet for information! Not only has this resulted in a 41% increase in exam scores, but it also unlocks a new level of reasoning and search capabilities. Learn about this academic 'cheat code' evolution and why you might want to get your AI a library card! Paper: https://arxiv.org/abs/2503.09516 Code: https://github.com/PeterGriffinJin/Search-R

Apr 21, 2025

330

Hugging Face's Top Model Leaderboard Unveiled: AI Innovation Continues to Heat Up

Hugging Face recently released its top model leaderboard for the second week of April 2025, encompassing various modalities such as text generation, image generation, and video generation. This highlights the rapid iteration and diverse applications of AI technology. According to AIbase, the models featured in this week's leaderboard not only showcase the innovative vitality of the open-source community but also reflect technical trends such as low-precision training and multi-modal generation. Below is an analysis of the leaderboard highlights, with professional insights provided by the AIbase editorial team. Text Generation Models: Balancing Efficiency and Specialization

Apr 21, 2025

370

Reachy2 Robot Released: Natural Interaction, $70,000 Price Tag

Hugging Face announced the launch of the open-source humanoid robot, Reachy2, through the acquisition of French startup Pollen Robotics. This news sparked significant discussion on social media and within the AI community, considered a major milestone in the convergence of humanoid robotics and generative AI. Designed as a lab partner for AI research and education, Reachy2's open-source nature, advanced capabilities, and human-centric design have quickly made it a focus for top labs globally.

Apr 21, 2025

230

WORLDMEM Open Source Release: Revolutionizing Long-Term Consistent World Simulation Technology

Apr 18, 2025

270

Hugging Face Acquires Pollen Robotics, Ushering in a New Era for Robotics

On April 15th, Hugging Face, the renowned open-source large language model platform, announced its acquisition of Pollen Robotics, marking its official entry into the physical robotics field. While specific transaction terms remain undisclosed, the acquisition will bring approximately 20 Pollen Robotics employees to Hugging Face. This represents the company's largest personnel acquisition to date, signifying its ambition in expanding its business areas. Hugging Face's co-founder...

Apr 16, 2025

150

Hugging Face, Prominent Open-Source AI Platform, Acquires Pollen Robotics to Enter Robotics Market

Hugging Face, a leading AI development platform, recently announced the acquisition of Pollen Robotics, a French humanoid robotics startup, marking its strategic foray into the robotics sector. While the financial details of the deal remain undisclosed, the acquisition has generated significant attention. Founded in 2016 by engineers Matthieu Lapeyre and Pierre Rouanet, Pollen Robotics' flagship product, Reachy2, is an advanced humanoid robot already utilized at Cornell...

Apr 16, 2025

230

Hugging Face Acquires Pollen Robotics to Accelerate Open-Source Robotics

Hugging Face, the AI development platform, has announced the acquisition of French robotics startup Pollen Robotics for an undisclosed sum. This marks Hugging Face's first foray into hardware and aims to promote the global adoption and development of open-source robotics. Pollen Robotics, founded in 2016 and based in Bordeaux, France, is known for its open-source humanoid robot, Reachy2. Priced at approximately $70,000, Reachy2 has been adopted by institutions such as Cornell University.

Apr 15, 2025

210

VisualCloze: A Highly Flexible Image Generation Framework Leveraging Visual Context Learning

Innovation in AI-powered image generation continues at a rapid pace. Hugging Face recently launched VisualCloze, a new tool utilizing Visual In-Context Learning, marking a significant advancement in general image generation frameworks. AIbase, through analysis of recent social media activity, provides an in-depth look at this tool's highlights and potential, offering readers a firsthand report.

Apr 11, 2025

480

Hugging Face Adds Handy Feature: One-Click Check for Compatible Models

Hugging Face, a leading open-source AI community platform, has launched a highly anticipated new feature: users can quickly see which machine learning models their computer hardware can run via platform settings. Users simply add their hardware information, such as GPU model, to their Hugging Face profile settings (located at the top right corner: Profile Icon > Settings > Local Apps and Hardware).

Apr 3, 2025

590

Alibaba's Qwen-2.5-Omni Tops Global Open-Source Model Leaderboard

Apr 2, 2025

4.5k

AI News

AI Daily

AI Timeline

Al Hardware

Latest Cases

Image Collection

Video Collection

Audio Collection

Content Collection

Latest Tutorials

AI Product Ranking

AI Traffic Growth Ranking

AI Traffic Decline Ranking

AI Weekly Ranking

United States

China

India

Brazil

Image Generation

Personal Assistant

Character Generation

Video Generation

AI Project Ranking

AI Project Growth Ranking

AI Developer Ranking

AI Organization Ranking

Deepseek

TTS

LLM

ChatGPT

Overview

Kokoro-TTS: A Small Text-to-Speech Model that Once Ranked First on TTS Leaderboards

AIbase基地

This article is from AIbase Daily

AI News Recommendations

UIUC and Google Release Search-R1: A Large Language Model That Can Search and Answer Questions

Hugging Face's Top Model Leaderboard Unveiled: AI Innovation Continues to Heat Up

Reachy2 Robot Released: Natural Interaction, $70,000 Price Tag

WORLDMEM Open Source Release: Revolutionizing Long-Term Consistent World Simulation Technology

Hugging Face Acquires Pollen Robotics, Ushering in a New Era for Robotics

Hugging Face, Prominent Open-Source AI Platform, Acquires Pollen Robotics to Enter Robotics Market

Hugging Face Acquires Pollen Robotics to Accelerate Open-Source Robotics

VisualCloze: A Highly Flexible Image Generation Framework Leveraging Visual Context Learning

Hugging Face Adds Handy Feature: One-Click Check for Compatible Models

Alibaba's Qwen-2.5-Omni Tops Global Open-Source Model Leaderboard