CelebV-Text

A large-scale, high-quality, and diverse face text-video dataset

CommonProductVideoFaceText
CelebV-Text is a large-scale, high-quality, and diverse face text-video dataset designed to promote research on face text-video generation tasks. The dataset contains 70,000 out-door face video clips, each accompanied by 20 text descriptions covering 40 general appearances, 5 detailed appearances, 6 lighting conditions, 37 actions, 8 emotions and 6 light directions. CelebV-Text has been validated through comprehensive statistical analysis for its superiority in video, text, and text-video correlation, and it constructs a benchmark to standardize the evaluation of face text-video generation tasks.
Visit

CelebV-Text Alternatives