AudioLM

High-quality audio generation framework

CommonProductOthersAudio GenerationLanguage Model

AudioLM is a framework developed by Google Research for high-quality audio generation with long-term consistency. It maps input audio to discrete token sequences and treats audio generation as a language modeling task in this representational space. By training on a large corpus of raw audio waveforms, AudioLM learns to generate natural and coherent audio continuations, producing grammatically and semantically plausible speech segments even without text or annotations while preserving the speaker's identity and prosody. Furthermore, AudioLM is capable of generating coherent piano music continuations, even though no symbolic representation of music was employed during training.

Product Finder

Product Submit

AI Models Finder

MCP Servers

MCP Client

MCP Inspector

Case Tutorials

Latest AI News

AI Daily Brief

AudioLM

AudioLM Visit Over Time

AudioLM Visit Trend

AudioLM Visit Geography

AudioLM Traffic Sources

AudioLM Alternatives

AudioLM — High-quality audio generation framework

Neural Network Diffusion — Implementation of Neural Network Diffusion Model

SoundStorm — Efficient Parallel Audio Generation Technology

FreGrad — Lightweight and fast frequency-aware diffusion audio codec

Qwen2-Audio — Large audio language model launched by Alibaba Cloud

SALMONN — SALMONN: Speech Audio Language Music Open Neural Network

Trustworthy Language Model (TLM) Playground — Try Cleanlab's Trustworthy Language Model (TLM) in your browser

Kimi-Audio — Kimi-Audio is an open-source audio foundation model that excels in audio understanding and generation.

GenAU — Audio Generation and Automatic Captioning Model

Audiobox — AI audio generation research under Meta

MaskVAT — A video-to-audio generation model that enhances synchronization

vta-ldm — Video to Audio Generation Model

stable-audio-tools — A generative audio model library based on PyTorch

Make-An-Audio 2 — Text-to-audio generation technology based on diffusion models

TangoFlux — An efficient text-to-audio generation model

OpenCompass 2.0 Large Language Model Leaderboard — A real-time large language model leaderboard that provides comprehensive performance assessments.

BlueLM Large Model — An independently developed intelligent language understanding model by vivo

OmniAudio-2.6B — The fastest edge-deployed audio language model in the world.

Neural Wave — Automate with simple language instructions

Bark — Highly realistic multilingual text-to-audio generation model

Llama-3.1-Nemotron-51B — An efficient and accurate AI language model

InternLM2.5-7B-Chat GGUF — Large language model, efficient text generation.

StemGen — StemGen: An Audio-conditioned Music Generation Model

Stable Audio Open — Open-source audio samples and sound design models

CodeGemma — Leading code generation large language model

AILIBRI — A comprehensive directory of AI neural network tools

Stable Audio Open 1.0 — An AI model that generates variable-length stereo audio based on text prompts.

Stable Audio Open Demo — Generate stereo audio from text prompts

Self-Rewarding Language Models — Language Model Self-Reward Training

VideoPoet — A large language model for video generation