Groq has recently launched the Whisper Large-V3 model, allowing users to utilize the API in the Playground or local projects for speech transcription and translation. This model supports transcription in multiple languages, offers extremely fast transcription speeds, and can translate other languages into English.

image.png

Playground Link:https://console.groq.com/playground

Currently, users can experience and use this feature for free on the Playground, where transcribing a 4 minute and 30 second video takes only about 3 seconds. Additionally, Groq provides an API interface that users can integrate into their local projects.

The Whisper API interface is designed to be compatible with OpenAI standards, offering users access to two core functions: speech-to-text and speech translation. Users can easily integrate these features into their own applications, whether developing smart assistants or automated translation systems, enjoying a convenient development experience.

In terms of performance, the Whisper API utilizes the advanced "whisper-large-v3" model, ensuring top-notch performance in speech-to-text and translation tasks.

Furthermore, the API has clear support standards for audio file formats and sizes, including common formats like mp3, mp4, and wav, with a file size limit of 25MB. It is particularly noteworthy that for files containing multiple audio tracks, the Whisper API will only process the first track, requiring users to perform appropriate audio preprocessing before uploading.

To enhance transcription quality and efficiency, the Whisper API downsamples the audio on the server to a mono 16,000Hz. Groq recommends that users complete this preprocessing step on the client side, which not only helps reduce file size but also allows for the processing of longer audio files.

API Interface:

Speech-to-Text: https://api.groq.com/openai/v1/audio/transcriptions

Speech Translation: https://api.groq.com/openai/v1/audio/translations