The Ostris team has released Flex.2-preview, an 800-million parameter text-to-image diffusion model designed for seamless integration with ComfyUI workflows. According to AIbase, the model excels in generating images with precise control over lines, poses, and depth. It supports general controls and inpainting functionalities, continuing the fine-tuning evolution from Flux.1Schnell to OpenFlux.1 and Flex.1-alpha. Flex.2-preview is open-sourced on Hugging Face under the Apache2.0 license, making it a focal point in the AI art creation community due to its flexible workflow integration.
Core Features: Universal Control and Seamless Workflow Integration
Flex.2-preview redefines text-to-image generation with its powerful control capabilities and native ComfyUI support. AIbase has summarized its key features:
Universal Control Support: Built-in line (Canny), pose, and depth controls allow users to precisely guide image generation, such as creating 3D-style scenes from depth maps or detailed illustrations from line art.
Inpainting Capabilities: Supports advanced inpainting, allowing users to specify areas via masks for content replacement or repair, e.g., replacing a dog with "a white robot dog sitting on a bench".
ComfyUI Workflow Integration: The model is optimized for ComfyUI, offering node-based workflow support to simplify complex task configurations, such as combining text-to-image, image-to-image, and control networks.
Efficient Generation: Based on a streamlined 800-million parameter architecture, generating 1024x1024 high-resolution images requires only 50 inference steps, suitable for consumer-grade GPUs with 16GB VRAM.
AIbase noted that in community tests, users leveraged Flex.2-preview's control nodes to generate a "cyberpunk cityscape at night," achieving high compositional consistency through depth map and line control, showcasing its potential in creative design.
Technical Architecture: Evolution from Flux.1Schnell to Flex.2
Flex.2-preview is based on Black Forest Labs' Flux.1Schnell, after undergoing multi-stage fine-tuning and optimization. AIbase analysis reveals its technological evolution includes:
Architecture Optimization: Inherits Flux.1's Rectified Flow Transformer architecture, featuring 8 dual-transformer blocks (lighter than the 19 in Flux.1-dev), eliminating the reliance on Classifier-Free Guidance (CFG) through a Guidance Embedder.
Control and Inpainting Integration: Employs a 16-channel latent space design, combining noise latent, variational autoencoder (VAE)-encoded inpainting images, masks, and control inputs (totaling 49 channels) to support flexible control and inpainting workflows.
Open-Source and Fine-tuning Support: Provides fine-tuning tools via AI-Toolkit, allowing developers to bypass the Guidance Embedder for customized training to generate models with specific styles or themes, while retaining the commercial-friendliness of the Apache2.0 license.
Efficient Inference: Supports FP8 and bfloat16 precision, reducing memory footprint through 8-bit quantization with TorchAo, optimizing inference speed on hardware like RTX3090.
AIbase believes that Flex.2-preview's lightweight design and universal control capabilities make it an ideal choice for the ComfyUI ecosystem, offering more flexible performance than Flux.1Schnell in complex workflows.
Application Scenarios: From Art Creation to Commercial Design
Flex.2-preview's versatility makes it suitable for various creative and commercial applications. AIbase summarizes its main uses:
Digital Art and Illustration: Artists can quickly generate concept art or illustrations using line and depth control, suitable for game art and animation pre-visualization.
Advertising and Brand Design: Utilize inpainting functionality to quickly adjust advertising materials, such as replacing products or backgrounds while maintaining brand consistency.
Film and Content Creation: Supports pose-controlled character design or scene generation, accelerating storyboard and visual effects development.
Education and Prototyping: Provides a low-cost image generation solution for teaching or product prototypes, allowing students and startups to quickly iterate visual ideas.
Community feedback shows that Flex.2-preview surpasses OpenFlux.1 in generating image details and control precision when handling complex prompts (e.g., "a steampunk mechanic repairing a robot in a factory"), particularly achieving near Midjourney-level quality in hand rendering and text generation. AIbase observes that its integration capability with XLabs' ControlNet further enhances workflow diversity.
Getting Started: Quick Deployment and ComfyUI Integration
AIbase understands that Flex.2-preview's deployment is extremely user-friendly for ComfyUI users, requiring 16GB VRAM (RTX3060 or higher recommended). Developers can quickly get started by following these steps:
Download Flex.2-preview.safetensors (huggingface.co/ostris/Flex.2-preview) from Hugging Face and place it in ComfyUI/models/diffusion_models/;
Ensure ComfyUI is updated to the latest version (via "Update All" in ComfyUI Manager) and install the necessary CLIP models (t5xxl_fp16.safetensors and clip_l.safetensors) and VAE (ae.safetensors);
Download the officially provided flex2-workflow.json, drag it into ComfyUI to load the workflow, and configure the prompt and control images (such as depth maps or line art);
Run inference, adjust control_strength (0.5 recommended) and guidance_scale (3.5 recommended) to generate 1024x1024 images.
The community recommends using the provided Diffusers example code or ComfyUI's Flex2Conditioning Node to optimize generation results. AIbase reminds users to ensure that torch, diffusers, and transformers libraries are installed for the first run and that node connections in the workflow are complete.
Performance Comparison: Surpassing Predecessors and Competitors
Flex.2-preview significantly outperforms its predecessors, OpenFlux.1 and Flux.1Schnell. AIbase has compiled a comparison with mainstream models:
Image Quality: In VBench evaluation, Flex.2-preview's CLIP score (0.82) is close to Flux.1-dev (0.84), exceeding Flux.1Schnell (0.79), particularly excelling in hand details and complex compositions.
Control Precision: Combined with XLabs ControlNet, Flex.2's consistency in Canny and depth control tasks surpasses InstantX's Flux.1-dev-Controlnet-Union-alpha by approximately 8%.
Inference Speed: Generating 1024x1024 images (50 steps) takes an average of 20 seconds (RTX3090, FP8), approximately 15% faster than Flux.1-dev, suitable for rapid iteration.
Resource Consumption: 800 million parameters and FP8 quantization reduce its memory requirements to 60% of Flux.1-dev, making it more suitable for consumer-grade hardware.
AIbase believes that Flex.2-preview's performance balance makes it unique among open-source models, especially suitable for workflows requiring high control precision and fast generation.
Community Feedback and Improvement Directions
Following its release, Flex.2-preview has received high praise from the community for its flexible control capabilities and open-source spirit. Developers call it "maximizing the potential of ComfyUI workflows," particularly impressive in art creation and inpainting tasks. However, some users have pointed out that the model's semantic understanding of complex prompts still has room for improvement, suggesting enhancing the T5 encoder's prompt processing capabilities. The community also expects Flex.2 to support video generation and broader ControlNet integration (such as pose estimation). The Ostris team responded that the next version will optimize multi-modal prompt processing and introduce dynamic threshold adjustment to further improve generation stability. AIbase predicts that Flex.2 may be combined with Hailuo Image or the control module of the Hunyuan 3D engine to build a cross-modal creation ecosystem.
Future Outlook: The Continued Evolution of Open-Source AI Art
The release of Flex.2-preview demonstrates Ostris' innovative capabilities in the open-source AI image generation field. AIbase believes that its evolution from Flux.1Schnell to Flex.2 showcases the potential of community-driven development, particularly its integration capabilities within the ComfyUI ecosystem offering endless possibilities for developers. With the continued iteration of the AI-Toolkit, Flex.2 is expected to become a standard model for fine-tuning and customized generation. The community is already discussing combining it with the MCP protocol to build a unified AI art workflow, similar to online platforms like RunComfy. AIbase anticipates the release of Flex.2's official version in 2025, particularly breakthroughs in multi-resolution support and real-time generation.
Project Address: https://huggingface.co/ostris/Flex.2-preview