Tencent HunYuan announced the open-source release of its customizable image generation plugin, InstantCharacter, achieving compatibility with the open-source text-to-image model, Flux. This launch marks a significant breakthrough in character consistency and image generation accuracy, providing content creators with more efficient and flexible tools.
InstantCharacter's core advantage lies in its ability to ensure character consistency and realism across different scenes, while maintaining high image quality, precision, and flexible text editing capabilities. Users can easily place any character in any desired pose using simple prompts. For example, with just an image and a description like "a rabbit in the kitchen drinking soup with a spoon," the corresponding image can be generated. This capability is particularly crucial in multi-round text-to-image scenarios, solving the challenge of character consistency.
Technically, InstantCharacter utilizes a novel framework built upon the DiT model. It introduces a scalable adapter employing multiple transformer encoders to effectively handle open-domain character features and seamlessly interact with the latent space of modern diffusion transformers. This design allows the system to flexibly adapt to different character traits while maintaining high consistency.
To effectively train this framework, the Tencent HunYuan team constructed a large-scale character dataset containing tens of millions of samples. The dataset is systematically organized into paired (multi-view characters) and unpaired (text-image combinations) subsets, enabling simultaneous optimization of identity consistency and text editability through different learning pathways. This dual data structure design further enhances the model's generalization ability and image quality.
In practical evaluations, InstantCharacter's performance rivals leading models like GPT-4o. It can handle images of various styles and complexities, suitable for applications such as comic creation and film production. InstantCharacter allows content creators to maintain high character consistency and more efficiently create visual works that meet their needs.
- Project Website: https://instantcharacter.github.io/
- Code: https://github.com/Tencent/InstantCharacter
- Hugging Face Demo: https://huggingface.co/spaces/InstantX/InstantCharacter
- Paper: https://arxiv.org/abs/2504.12395