Recently, the InstantX team, in collaboration with research groups from Nanjing University of Science and Technology, Beihang University, and Peking University, has jointly developed a new style transfer model named CSGO. This model aims to enhance image generation technology, particularly in the integration of content and style.
CSGO primarily supports three modes of style transfer, as detailed below:
1. Content image + style reference image, to synthesize a style image of the content. As shown in the example below, providing the original image (e.g., "bear, house") and a style reference image can transform the original image into a styled version based on the reference.
2. Style reference image + text prompt, to synthesize a style image of the text content. As shown in the example below, providing a style reference image and a text prompt (e.g., "a cat, a dog, a man, a panda") can generate a corresponding styled image.
3. Editing specified objects in the image through text.
The core of the CSGO model lies in its unique data construction process. The research team has meticulously designed a data generation and automatic cleaning pipeline, constructing a large-scale style transfer dataset named IMAGStyle. This dataset includes 210,000 image triplets, becoming an important resource for academic research and exploration in image generation technology.
The design philosophy of this model is highly innovative, with CSGO clearly distinguishing content and style features during image generation. Researchers state that the advantage of this model lies in its end-to-end training method, meaning no fine-tuning is required during the inference phase.
Additionally, another highlight of the CSGO model is that it retains the generative capabilities of the original text-to-image model without training the UNet. Through these innovations, CSGO achieves image-driven style transfer, text-driven style synthesis, and text-edit-driven style synthesis.
In terms of experimental results, CSGO performs exceptionally well. Researchers have provided a series of quantitative and visual comparative data, comprehensively comparing it with the latest existing methods, demonstrating CSGO's superiority in style control capabilities.
Key Points:
🌟 The CSGO model has successfully generated the IMAGStyle dataset, which includes 210,000 image triplets through an innovative data construction pipeline.
🎨 The model achieves a clear separation of content and style, supporting multiple generation methods, including image-driven and text-driven style transfer.
📊 Experimental results show that CSGO outperforms existing technologies in style control capabilities, showcasing new heights in image generation.