FLOAT

Audio-driven talking avatar video generation method based on flow matching.

CommonProductImageArtificial IntelligenceAvatar Animation
FLOAT is an audio-driven avatar video generation technique that utilizes a flow matching generative model, transitioning the generative modeling from pixel-based latent space to learned motion latent space, achieving temporally coherent motion design. This technology incorporates a transformer-based vector field predictor and features a straightforward yet effective per-frame conditioning mechanism. Additionally, FLOAT supports speech-driven emotional enhancement, allowing for the natural integration of expressive motion. Extensive experiments demonstrate that FLOAT outperforms existing audio-driven avatar methods in visual quality, motion fidelity, and efficiency.
Visit

FLOAT Visit Over Time

Monthly Visits

59

Bounce Rate

44.35%

Page per Visit

1.0

Visit Duration

00:00:00

FLOAT Visit Trend

FLOAT Visit Geography

FLOAT Traffic Sources

FLOAT Alternatives