UNO is a multi-image conditional generation model based on diffusion transformers. By introducing progressive cross-modal alignment and universal rotational positional embedding, it achieves highly consistent image generation. Its main advantages lie in enhanced controllability over the generation of single or multiple subjects, making it suitable for various creative image generation tasks.