AI News

Don't miss any moment of global AI innovation

AI Daily

Daily three-minute AI industry trends

AI Timeline

AI industry milestones

Al Hardware

Lists all AI hardware products.

AI Monetization Guide

Latest Cases

AI monetization case sharing

Image Collection

AI image creation monetization cases

Video Collection

AI video creation monetization cases

Audio Collection

AI audio creation monetization cases

Content Collection

AI content writing monetization cases

AI Tutorials

Latest Tutorials

Free sharing of the latest AI tutorials

AI Product Rankings

AI Product Ranking

Shows total visits ranking of AI websites

AI Traffic Growth Ranking

Track fastest growing AI websites by traffic

AI Traffic Decline Ranking

Focus on AI websites with significant traffic drops

AI Weekly Ranking

Shows weekly visits ranking of AI websites

Popular Country Rankings

United States

AI websites most popular with US users

China

AI websites most popular with Chinese users

India

AI websites most popular with Indian users

Brazil

AI websites most popular with Brazilian users

Popular Category Rankings

Image Generation

Total visits ranking of AI image generation websites

Personal Assistant

Total visits ranking of AI personal assistant websites

Character Generation

Total visits ranking of AI character generation websites

Video Generation

Total visits ranking of AI video generation websites

Popular Open Source Data Rankings

AI Project Ranking

GitHub popular AI projects by total stars

AI Project Growth Ranking

GitHub popular AI projects by growth rate

AI Developer Ranking

GitHub popular AI developer ranking

AI Organization Ranking

GitHub popular AI organization ranking

Popular Open Source Categories

Deepseek

GitHub popular deepseek open source projects

TTS

GitHub popular TTS open source projects

LLM

GitHub popular LLM open source projects

ChatGPT

GitHub popular ChatGPT open source projects

AI Open Source Project Library

Overview

Overview of GitHub popular AI open source projects

Product Library Tool Navigation MCP

Gladia Voice Recognition API Secures $16 Million in Series A Funding, Challenging Amazon, Microsoft, and Google

AIbase基地

Published inAI News · 6 min read · Oct 16, 2024

168

French startup Gladia has raised $16 million in an A-round funding for its speech recognition application programming interface (API). Essentially, Gladia's API can convert any audio file into text with high accuracy and low latency.

Although Amazon, Microsoft, and Google offer speech-to-text APIs as part of their cloud hosting product suites, their performance does not match some of the innovative models provided by specialized startups. Especially since OpenAI released the Whisper model, the field has made significant strides in recent years. Gladia competes with well-funded companies like AssemblyAI, Deepgram, and Speechmatics.

Audio Sound Waves

Image source note: The image was generated by AI, provided by the image licensing service Midjourney

Gladia initially offered a fine-tuned version of the Whisper speech-to-text model, with some necessary improvements. For example, the startup supports speaker separation out of the box—it can detect when there are multiple speakers in a conversation and separate the recording and transcription text according to who is speaking.

Gladia supports 100 languages and various accents. The tool reportedly works effectively, as we have been using Gladia to transcribe some interviews, and accents have not been an issue.

The startup offers its speech-to-text model as a hosted API, which users can integrate into their own applications and services. Over 600 companies use Gladia, including several meeting recorders and note-taking assistants like Attention, Circleback, Method Financial, Recall, Sana, and Veed.io.

This particular use case is interesting because many companies must chain API calls. They first convert speech to text, then input the text into large language models (LLMs) like GPT-4o or Claude3.5Sonnet to extract knowledge from large amounts of text.

With the new funds, Gladia hopes to streamline this process by integrating audio intelligence and LLM-based tasks into a single API call. For example, customers can generate conversation summaries from a few bullet points without relying on third-party LLM APIs.

Another issue Gladia aims to address is latency. You may have seen demonstrations of real-time audio conversations using AI-based call agents (11x has a good demo on their website), which must transcribe in real-time to make the conversation sound as human as possible.

Gladia has chosen to tackle this problem and currently can transcribe real-time conversations with a latency of less than 300 milliseconds. The company claims that real-time processing is now as good as the default asynchronous batch transcription API, though it's hard to judge without proper testing. As co-founder and CEO Jean-Louis Quéguiner (pictured right) told TechCrunch, the startup's goal is "batch quality with real-time capability."

In addition to AI call agents, it's conceivable that call centers could use these real-time features to help call agents find relevant information during a call. "Our single API is compatible with all existing technology stacks and protocols, including SIP, VoIP, FreeSwitch, and Asterisk," said co-founder and CTO Jonathan Soto (pictured left) in a statement.

XAnge led the A-round funding. Illuminate Financial, XTX Ventures, Athletico Ventures, Gaingels, Mana Ventures, Motier Ventures, Roosh Ventures, and Soma Capital also participated.

Gladia believes we are on the cusp of a "ChatGPT moment" for audio applications. GPT technology has been around for years, but ChatGPT really popularized LLMs through its consumer-like chat interface.

As Apple or Google begins to include transcription models in iOS or Android, consumers will start to understand the value of automatic transcription in the applications they use. Then developers may integrate audio features into their products, which is where API providers like Gladia come in.

Gladia Whisper SpeechRecognitionAPI StartupCompany

This article is from AIbase Daily

Welcome to the [AI Daily] column! This is your daily guide to exploring the world of artificial intelligence. Every day, we present you with hot topics in the AI field, focusing on developers, helping you understand technical trends, and learning about innovative AI product applications.

—— Created by the AIbase Daily Team

AI News Recommendations

Apple's new Speech API transcribes at an impressive speed, surpassing OpenAI Whisper by 55%

Jun 18, 2025

310

Tencent Cloud's Intelligent Digital Humans Fully Integrate with DeepSeek Large Model to Enhance User Interaction

Mar 5, 2025

460

Northwestern Polytechnical University Open Source Voice Understanding Model OSUM, Integrating Whisper and Qwen2, Supports 8 Voice Understanding Tasks

Feb 20, 2025

3.7k

aiOla Open Source AI Audio Transcription Model Whisper-NER for Real-time Protection of Sensitive Information

Nov 21, 2024

2.0k

New Open-Source Speech Recognition Model Moonshine: Five Times Faster than OpenAI Whisper

The American startup Useful Sensors has launched an open-source speech recognition model called Moonshine. Moonshine is designed to process audio data more efficiently, using computational resources more economically and achieving processing speeds five times faster than OpenAI's Whisper. This new model is specifically built for real-time applications on resource-constrained hardware and features a flexible architecture. Unlike Whisper, which processes audio in fixed 30-second segments, Moonshine offers a different approach.

Nov 5, 2024

3.8k

AI Voice Transcription Tool Whisper Exposed for Serious 'Hallucinations' and Frequent Fabrication

Oct 28, 2024

1.9k

Alibaba Releases New Voice Model Qwen2-Audio, Surpassing OpenAI Whisper

Alibaba recently launched the new open-source voice model Qwen2-Audio, which excels in speech recognition, translation, and audio analysis, achieving significant performance improvements. Qwen2-Audio offers a basic version and an instruction fine-tuning version, supporting multiple languages such as Chinese, Cantonese, French, English, and Japanese, facilitating sentiment analysis and translation applications. Compared to Qwen-Audio, Qwen2-Audio features comprehensive optimizations in architecture and performance, utilizing more natural language prompts during the pre-training phase.

Aug 10, 2024

5.7k

Israeli Company Launches Open Source Speech Recognition Model Whisper Medusa with 50% Speed Increase

Israeli AI company aiOla has released an open source speech recognition model named Whisper Medusa, which is based on an improved architecture design that incorporates multi-head attention mechanisms, allowing it to process speech 50% faster than OpenAI's Whisper model. Whisper Medusa makes parallel predictions of ten tokens instead of the traditional one at a time, significantly enhancing speech recognition speed while maintaining performance. Its innovative training method employs weak supervision, freezing the backbone system and utilizing...

Aug 7, 2024

2.4k

aiOla Releases Ultra-Fast Open-Source Speech Recognition Model Whisper-Medusa, 50% Faster than OpenAI's Whisper

Israeli AI startup aiOla has launched an open-source speech recognition model called Whisper-Medusa, which is 50% faster than OpenAI's Whisper. The model is built on Whisper, utilizing an innovative 'multi-head attention' architecture that allows for the prediction of more tokens at once, resulting in a significant speed increase while maintaining accuracy. Whisper-Medusa has been released on Hugging Face under the MIT License, permitting both research and commercial use.

Aug 2, 2024

3.0k

GroqCloud Quietly Launches Whisper V3 Large, Inviting Developers to Experience Now!

Recently, GroqCloud quietly launched Whisper V3 Large. Give it a try now, as it's already integrated into GroqChat for everyone to experience.Developers can now leverage Whisper's voice-to-text feature along with our speed to build. Keep building! [Product Link](https://groqcloud.com) [Experience Link](https://groqchat.com)

Jul 4, 2024

850