April 27, 2025, AIbase reported: Step1X-Edit, an image editing model developed by the Stepfun AI team, was officially open-sourced recently, attracting widespread attention from the industry. This general-purpose image editing framework demonstrates potential comparable to closed-source models like GPT-4o and Gemini 2 Flash, thanks to its superior performance and practical open-source nature.

Below is a comprehensive analysis of Step1X-Edit, covering its technical highlights, application scenarios, and future impact.

QQ_1745715180413.png

Innovative Technical Architecture

Step1X-Edit combines a Multimodal Large Language Model (MLLM) with a Diffusion Transformer (DiT). It generates high-quality target images by processing user-provided reference images and editing instructions. Its core innovation lies in the fusion of the semantic understanding capabilities of the MLLM and the image generation capabilities of the diffusion model. By extracting latent embeddings and integrating them with the diffusion image decoder, the model accurately responds to diverse editing needs.

During training, the team built a data pipeline containing over 1 million high-quality triplets (reference image, instruction, target image), covering 11 editing types, ensuring the model's robustness in complex scenarios.

GEdit-Bench: A Benchmark for Real-World Scenarios

To more realistically evaluate the performance of image editing models, the Stepfun AI team released the new benchmark GEdit-Bench. This benchmark is designed based on real-world user instructions, covering a wide range of editing scenarios, from simple color adjustments to complex object additions or scene reconstructions.

Experimental results show that Step1X-Edit significantly outperforms existing open-source baseline models on GEdit-Bench, approaching the level of leading closed-source models. The open-source release of this benchmark provides a more practical evaluation tool for research in the image editing field, marking an advancement in industry evaluation standards.

Open-Source Resources and High-Performance

The code, model weights, and GEdit-Bench evaluation data for Step1X-Edit were released on April 25, 2025, via Hugging Face and ModelScope platforms. The model can run on a single H800 GPU, with 80GB of VRAM recommended for optimal generation quality.

For 512x512 resolution images, the model can complete editing within 5 seconds using 42GB of VRAM; 1024x1024 resolution requires 50GB of VRAM and takes approximately 22 seconds. The officially provided inference code and installation scripts further lower the barrier to entry, supporting Python 3.10 and above, and are compatible with mainstream deep learning frameworks such as PyTorch 2.3.1 and 2.5.1.

Wide-ranging Application Prospects

Step1X-Edit's flexibility and high accuracy make it suitable for various scenarios. Whether professional designers are optimizing creative works or ordinary users are enhancing photos, the model can achieve complex edits with simple instructions.

For example, users can achieve background replacement, object removal, or style transfer through text descriptions, generating professional-quality images. Furthermore, the model has been launched on platforms like fal.ai, allowing users to experience its functionality through online demos. The release of this open-source model provides a powerful tool for content creators, developers, and researchers, promoting the popularization and innovation of image editing.

Profound Impact on the Industry

The open-sourcing of Step1X-Edit not only promotes the development of image editing technology but also injects new vitality into the open-source community. Compared to closed-source models that rely on proprietary data, Step1X-Edit, through its transparent training process and high reproducibility, provides a foundation for research and optimization for academia and developers.

Industry experts believe that the release of this model may encourage more companies to explore the commercialization path of open-source AI, while also motivating closed-source model providers to further enhance performance.

Future Optimization and Expectations

Although Step1X-Edit has demonstrated strong capabilities, its high VRAM requirements may limit access for some users. In the future, the team plans to optimize model efficiency, lower hardware barriers, and expand support for more editing types and resolutions. Furthermore, continuous updates to GEdit-Bench will further enrich the evaluation scenarios, helping the industry to establish a unified performance standard. AIbase will continue to monitor the progress of Step1X-Edit and bring you the latest developments in the open-source AI field.

Experience it here: https://huggingface.co/spaces/stepfun-ai/Step1X-Edit