IP-Adapter is a lightweight adapter that enables image prompt functionality for pre-trained text-to-image diffusion models. Its key design is a decoupled cross-attention mechanism that separates the cross-attention layers of text features and image features. IP-Adapter not only supports compatibility with existing control tools but also allows for multimodal image generation in conjunction with text prompts. Compared to other existing methods, IP-Adapter not only demonstrates superior performance in image quality but also generates images that more closely align with multimodal prompts.