PixelPlayer

Audio-Visual Source Separation System

CommonProductMusicAudio separationAudio-visual analysis

PixelPlayer is a system that can, by watching a large number of unmarked videos, learn to locate the image regions producing sound and separate the input audio into a set of components representing the sound of each pixel. Our method leverages the natural synchronous features of the visual and auditory modalities to learn a joint model for parsing sound and images without the need for additional human labeling. The system is trained using a large number of training videos featuring solo and duet performances of different instrumental combinations. There is no supervision on which instruments appear, where they are, and what sounds they produce for each video. In the testing phase, the system's input consists of videos with performances of different instruments and monaural auditory inputs. The system performs audio-visual source separation and localization, separating the input audio signal into N sound channels, each corresponding to a different instrumental category. In addition, the system can localize sound and assign different audio waveforms to each pixel in the input video.

Product Finder

Product Submit

AI Models Finder

MCP Servers

MCP Client

MCP Inspector

Case Tutorials

Latest AI News

AI Daily Brief

PixelPlayer

PixelPlayer Visit Over Time

PixelPlayer Visit Trend

PixelPlayer Visit Geography

PixelPlayer Traffic Sources

PixelPlayer Alternatives

PixelPlayer — Audio-Visual Source Separation System

ManiWAV — Robot manipulation learning from wild audio-visual data

LuDe — AI-Powered Audio-Visual Generation Tool

ReSyncer — Unified audio-visual synchronization for facial performers

Audio-SDS — An innovative method to achieve source separation and synthesis through audio diffusion models.

Audio Chat — Upload audio files for easy dialogue analysis.

33 Subtitle — Accurately identifies audio-visual content as text or SRT subtitles

Vocal Remover and Isolation — An online audio track separation tool

DenseAV — A self-supervised audio-visual feature alignment model.

Moises App — AI Audio Separation Tool for Musicians

Ultimate Vocal Remover GUI — Free vocal separation tool. Separates and extracts background music from audio.

Kimi-Audio — Kimi-Audio is an open-source audio foundation model that excels in audio understanding and generation.

NotebookLM Audio Overview — Transforms documents into AI-generated audio discussions for easier learning and retention.

Mikey Smart — An all-in-one AI-powered audio-visual service providing voice translation, voice customization, and voiceover.

DINOv2 — DINOv2: Robust Visual Features through Unsupervised Learning

AI Audio Kit — AI Audio Tool - Effortlessly Transcribe Audio

Audio Transcription Tool — Fast, Accurate, and Free Audio to Text Service

Article.Audio — Converts articles into high-quality audio

Stable Audio Open Demo — Generate stereo audio from text prompts

Bangin' Audio Recorder — Easily capture and refine your audio ideas

Audio Muse — All-in-One Online Audio Tool

Hailuo AI Audio — Hailuo AI Audio is an audio synthesis tool designed to create realistic speech.

MVSEP — MVSEP can separate the audio track and musical part in audio files.

Stable Audio Open 1.0 — An AI model that generates variable-length stereo audio based on text prompts.

Draw an Audio — Utilizing multi-command video-to-audio synthesis technology

stable-audio-tools — A generative audio model library based on PyTorch

Audio Transcription — Convert podcasts, audio files, or URLs into text, and obtain a smart summary.

Make-An-Audio 2 — Text-to-audio generation technology based on diffusion models

Qwen2-Audio — Large audio language model launched by Alibaba Cloud