FLOAT
Audio-driven talking avatar video generation method based on flow matching.
CommonProductImageArtificial IntelligenceAvatar Animation
FLOAT is an audio-driven avatar video generation technique that utilizes a flow matching generative model, transitioning the generative modeling from pixel-based latent space to learned motion latent space, achieving temporally coherent motion design. This technology incorporates a transformer-based vector field predictor and features a straightforward yet effective per-frame conditioning mechanism. Additionally, FLOAT supports speech-driven emotional enhancement, allowing for the natural integration of expressive motion. Extensive experiments demonstrate that FLOAT outperforms existing audio-driven avatar methods in visual quality, motion fidelity, and efficiency.
FLOAT Visit Over Time
Monthly Visits
59
Bounce Rate
44.35%
Page per Visit
1.0
Visit Duration
00:00:00