EasyControl: Empowering DiT Models with ControlNet-like Capabilities, Including Ghibli Style Transfer

AIbase基地

Published inAI News · 5 min read · Apr 7, 2025

In the field of AI-powered image generation, Diffusion Models (DMs) are transitioning from Unet-based architectures to Transformer-based architectures (DiT). However, the DiT ecosystem still faces challenges in plugin support, efficiency, and multi-conditional control. Recently, a team led by Xiaojiu-z introduced EasyControl, an innovative framework designed to provide efficient and flexible conditional control for DiT models, acting like a powerful "ControlNet" for DiT.

Core Advantages of EasyControl

EasyControl is not a simple model stacking, but a carefully designed unified conditional DiT framework. Its core advantages lie in its introduction of a lightweight Condition Injection LoRA module, a Position-Aware Training Paradigm, and the combination of Causal Attention and KV Cache technology, achieving significant performance improvements. These innovative designs make EasyControl excel in model compatibility (plug-and-play, style-preserving control), generation flexibility (supporting multiple resolutions, aspect ratios, and multi-conditional combinations), and inference efficiency.

Powerful Control Capabilities: Beyond Canny and OpenPose

One of the most striking features of EasyControl is its powerful multi-conditional control capabilities. Its codebase shows support for various control models, including but not limited to Canny edge detection, depth information, HED edge sketches, image inpainting, human pose (analogous to OpenPose), and semantic segmentation (Seg).

This means users can precisely guide the DiT model to generate images with specific structures, shapes, and layouts by inputting different control signals. For example, Canny control allows users to specify the outline of the generated object; pose control can guide the generation of images with specific human actions. This precise control significantly expands the application scenarios of DiT models.

Stunning Ghibli Style Transfer

Beyond basic structural control, EasyControl also demonstrates powerful style transfer capabilities, particularly in Ghibli style conversion. The research team trained a dedicated LoRA model using only 100 real Asian faces and Ghibli-style corresponding images generated by GPT-4. Surprisingly, this model can convert portraits into classic Ghibli animation style while preserving original facial features well. Users can upload portrait photos and use appropriate prompts to easily generate artistic works with a strong hand-drawn anime style. The project team also provides a Gradio demo for users to experience this functionality online.

The EasyControl team has already released the inference code and pre-trained weights. According to its Todo List, future releases will include spatial pre-trained weights, subject pre-trained weights, and training code, further enhancing EasyControl's functionality and providing researchers and developers with more comprehensive tools.

The emergence of EasyControl undoubtedly injects powerful control capabilities into Transformer-based diffusion models, effectively addressing the shortcomings of DiT models in conditional control. Its support for multiple control modes and impressive Ghibli style transfer capabilities suggest broad application prospects in the AI content generation field. With its efficient, flexible, and user-friendly features, EasyControl is poised to become an important component of the DiT model ecosystem.

Project Link: https://top.aibase.com/tool/easycontrol

EasyControl_Ghibli Model Released: Unlock Ghibli-Style Image Generation for Free

Recently, an AI model named EasyControl_Ghibli quietly launched on the Hugging Face platform, bringing exciting news to users: generate images in the style of Studio Ghibli effortlessly and without cost. This model's release provides a user-friendly tool for anime enthusiasts and creative individuals, allowing more people to incorporate their imaginations into the classic Ghibli aesthetic and brighten their daily lives. EasyControl...

AccVideo Achieves High-Quality Video Generation with Synthetic Data, Boosting Speed by 8.5x

Diffusion models have garnered significant attention in the field of AI video generation due to their superior performance. However, their inherent iterative denoising nature leads to a time-consuming and computationally expensive generation process, significantly hindering their widespread adoption. Recently, a research team from Beihang University, the University of Hong Kong, and the Shanghai AI Laboratory jointly released AccVideo, an innovative technology. This method, employing a novel and efficient distillation approach combined with a synthetic dataset, successfully boosts the generation speed of video diffusion models by an impressive 8.5 times.

Challenging Conventions: A Breakthrough Transformer Architecture Without Normalization Layers

In the field of deep learning, normalization layers are considered an indispensable component of modern neural networks. Recently, research led by Meta FAIR research scientist Zhuang Liu, titled "Transformer without Normalization Layers", has garnered significant attention. This research not only introduces a new technique called Dynamic Tanh (DyT), but also demonstrates the effectiveness of Transformer architectures even without traditional normalization layers.

Revolutionizing Long-Document Reasoning with APB: A 10x Speedup Over Flash Attention

Frustrated by the slow processing speed of large language models on long documents? Researchers from Tsinghua University have unveiled a groundbreaking technology – the APB parallel inference framework – that dramatically accelerates processing. Benchmark tests show this technology achieves a 10x speed improvement over Flash Attention when handling ultra-long texts. With the rise of models like ChatGPT, AI's ability to process vast amounts of text (hundreds of thousands of words) has increased significantly. However, this often comes at the cost of processing speed...

LanPaint: Zero-Shot Image Inpainting with Diffusion Models

Recently, developer scraed released LanPaint on GitHub, a zero-shot image inpainting tool. This tool aims to help users achieve high-quality image inpainting results on any Stable Diffusion (SD) model, including custom-trained models. LanPaint achieves this by iteratively prompting the model to 'think' before denoising, resulting in more seamless and accurate inpainting. A key feature of LanPaint is its zero-shot capability; users can start using it immediately without any training.

Google AI Introduces a Fundamental Framework for Scaling Inference Time in Diffusion Models

A research team from New York University, MIT, and Google has recently proposed an innovative framework aimed at addressing the bottleneck of inference time scaling in diffusion models. This groundbreaking research transcends the traditional method of simply increasing the denoising steps, opening up new avenues for improving generative model performance. The framework primarily unfolds from two dimensions: one is leveraging feedback from validators, and the second is implementing algorithms to discover better noise candidates. The research team builds on a pre-trained SiT-XL model with a resolution of 256×256, maintaining 250 fixed denoising steps.

Microsoft Launches Innovative AI Model MatterGen to Create New Materials Based on Specific Needs

Microsoft Research has launched a powerful AI system called MatterGen that can generate new materials with specific properties, potentially accelerating the development of key technologies such as batteries and solar panels. The introduction of MatterGen signifies a fundamental shift in how scientists discover new materials. Unlike traditional methods that screen millions of existing compounds, MatterGen directly generates new materials based on desired characteristics, akin to how AI image generators create visuals from textual descriptions.

New Technology for Coloring Line Art MangaNinja: Input Line Art and Reference Images for Accurate Coloring

Recently, a line art coloring method called MangaNinja has garnered widespread attention, allowing users to color target line art based on reference images with just an input of the line art and the reference image. This technology is based on a diffusion model and focuses on reference-image-guided line art coloring, significantly enhancing the accuracy of coloring and interactive control capabilities. The research team employed two innovative designs to ensure precise transfer of character details. First, they introduced a patch rearrangement module to facilitate correspondence learning between the reference color images and the target line art. Secondly, they adopted a point-driven approach.

AI News

AI Daily

AI Timeline

Latest Cases

Image Collection

Video Collection

Audio Collection

Content Collection

Latest Tutorials

AI Product Ranking

AI Traffic Growth Ranking

AI Traffic Decline Ranking

AI Weekly Ranking

United States

China

India

Brazil

Image Generation

Personal Assistant

Character Generation

Video Generation

AI Project Ranking

AI Project Growth Ranking

AI Developer Ranking

AI Organization Ranking

Deepseek

TTS

LLM

ChatGPT

Overview