Tencent's Hunyuan DiT Launches 6G Low-Memory Version, Hunyuan Captioner Goes Open Source

AIbase

Published inAI News · 4 min read · Jul 5, 2024

242

Tencent's MetaGen Image Generation Large Model (Hunyuan DiT) has recently been upgraded to a 6GB VRAM version, making it easy for personal computer users to run. This version is compatible with plugins such as LoRA and ControlNet and has added support for the Kohya graphical user interface, lowering the threshold for developers to train personalized LoRA models. The Hunyuan DiT model has been upgraded to version 1.2, with improvements in image texture and composition.

At the same time, Tencent has open-sourced the MetaGen Image Generation Annotation Model "Hunyuan Captioner," which supports both Chinese and English and has been optimized for text-to-image scenarios. It can more accurately understand Chinese semantics and output structured, complete, and accurate image descriptions. It can also identify famous people and landmarks and allows developers to supplement personalized background knowledge.

WeChat Screenshot_20240705081554.png

In addition, the open-source of the Hunyuan Captioner model enables researchers and data annotators in the field of text-to-image generation worldwide to improve the quality of image descriptions, generate more comprehensive and accurate descriptions, and enhance model performance. The generated datasets can be used to train models based on Hunyuan DiT as well as other visual models.

The three major updates of the Hunyuan DiT model include the launch of a low-vRAM version, integration with the Kohya training interface, and the upgrade to version 1.2, which further lowers the threshold for use and improves image quality. The generated images by Hunyuan DiT have better texture, but the previous high VRAM requirements discouraged many developers. Now, Hunyuan DiT has launched a low-vRAM version that can run with as little as 6GB VRAM, and through collaboration with Hugging Face, the low-vRAM version and related plugins have been integrated into the Diffusers library, simplifying the cost of use.

Kohya is an open-source lightweight model fine-tuning training service that provides a graphical user interface and is widely used for training diffusion model-based text-to-image models. Users can complete full-precision fine-tuning and LoRA training through Kohya without writing any code.

The Hunyuan Captioner model constructs a structured image description system, improves the completeness of the descriptions from multiple sources, injects a large amount of background knowledge, making the output descriptions more accurate and complete. These optimizations have made Hunyuan DiT one of the most popular domestic DiT open-source models, with its Github Star count exceeding 2.6k.

Official Website

https://dit.hunyuan.tencent.com

Code

https://github.com/Tencent/HunyuanDiT

Model

https://huggingface.co/Tencent-Hunyuan/HunyuanDiT

Paper

https://tencent.github.io/HunyuanDiT/asset/Hunyuan_DiT_Tech_Report_05140553.pdf

ByteDance Releases Innovative Image Synthesis Technology XVerse: Independent and Precise Control over Multiple Individuals

On June 26, 2025, ByteDance officially launched its latest image synthesis technology - XVerse, aimed at providing a high-precision multi-subject image generation solution. This innovative technology enables users to independently and precisely control multiple individuals, greatly enhancing the ability to generate personalized and complex scenes. The core of XVerse lies in its unique DiT modulation method, which allows control over the identity and semantic attributes of each subject without affecting the overall latent features of the image. By converting reference images into specific characteristics...

Tencent HunYuan 3D Generative Model Officially Releases Version 2.5, Significantly Enhancing Modeling Precision

The official release of Tencent HunYuan 3D generative model version 2.5 marks a significant leap into the ultra-high-definition era of 3D generation technology. This upgrade not only achieves a qualitative leap in modeling precision but also provides users with a more efficient and convenient creative experience, further lowering the barrier to entry for 3D content creation.

EasyControl: Empowering DiT Models with ControlNet-like Capabilities, Including Ghibli Style Transfer

In the field of AI art generation, diffusion models are transitioning from U-Net based architectures to Transformer-based architectures (DiT). However, the DiT ecosystem faces challenges in plugin support, efficiency, and multi-conditional control. Recently, a team led by Xiaojiu-z introduced EasyControl, an innovative framework designed to provide efficient and flexible conditional control capabilities for DiT models, effectively giving DiT models the power of ControlNet.

Meta Releases Llama 4 Large Language Model: Mixed-Expert Architecture Ushers in a New Era for AI

Meta has released its latest open-source AI model, Llama 4, marking another significant advancement in the field of artificial intelligence. Llama 4 comes in two versions, Scout and Maverick, designed to enhance AI model capabilities and performance. Meta states that Llama 4 is a multimodal large language model capable of processing various data types, including text, images, video, and audio, and can freely convert between these formats. Notably, the Llama 4 series is the first...

Meta Tests AI-Generated Instagram Comment Feature; User Reaction Mixed

Meta is testing a new feature that allows users to generate comment suggestions on Instagram using artificial intelligence. X user Jonah Manzano discovered and shared this test feature, showing users can tap a pencil icon below a post to access Meta AI, which analyzes the photo and provides three possible comments. Users can refresh for more suggestions if unsatisfied. A Meta spokesperson confirmed the company is testing Meta AI in multiple areas.

AI News

AI Daily

AI Timeline

Al Hardware

Latest Cases

Image Collection

Video Collection

Audio Collection

Content Collection

Latest Tutorials

AI Product Ranking

AI Traffic Growth Ranking

AI Traffic Decline Ranking

AI Weekly Ranking

United States

China

India

Brazil

Image Generation

Personal Assistant

Character Generation

Video Generation

AI Project Ranking

AI Project Growth Ranking

AI Developer Ranking

AI Organization Ranking

Deepseek

TTS

LLM

ChatGPT

Overview

Tencent's Hunyuan DiT Launches 6G Low-Memory Version, Hunyuan Captioner Goes Open Source

AIbase

This article is from AIbase Daily

AI News Recommendations

ByteDance Releases Innovative Image Synthesis Technology XVerse: Independent and Precise Control over Multiple Individuals

Giant Network's 'Space Kill' Integrates Tencent AI Technology, Generating Over 7 Million AI Players

Step1X-Edit: A New Benchmark in Open-Source Image Editing, Rivaling Closed-Source Models like GPT-4o

Tencent HunYuan 3D Generative Model Officially Releases Version 2.5, Significantly Enhancing Modeling Precision

Tencent's HunYuan InstantCharacter Open-Sourced: High Character Consistency, Customizable Poses, Styles, and Scenes

EasyControl: Empowering DiT Models with ControlNet-like Capabilities, Including Ghibli Style Transfer

Meta Releases Llama 4 Large Language Model: Mixed-Expert Architecture Ushers in a New Era for AI

Meta Tests AI-Generated Instagram Comment Feature; User Reaction Mixed

Tencent's Cumulative R&D Investment Reaches 3912 Billion Yuan Over Seven Years, Doubling Down on User-Friendly AI

Moore Threads Open-Sources Two Major AI Frameworks, Achieving Over 90% Training Efficiency on Domestic GPUs