Domestic large model DeepSeek has launched the brand new Janus-Pro multimodal large model, officially entering the text-to-image field. This move marks a significant breakthrough for DeepSeek in multimodal AI technology.
In the GenEval and DPG-Bench benchmark tests, Janus-Pro-7B not only surpassed OpenAI's DALL-E3 but also outperformed popular models such as Stable Diffusion and Emu3-Gen. Janus-Pro is released under the MIT open-source license, which means it can be used without restrictions in commercial scenarios. DeepSeek stated that Janus-Pro is an advanced version of the JanusFlow large model released on November 13, 2024.
Compared to its predecessor, Janus-Pro has optimized its training strategy, expanded its training data, and increased its model size. These improvements have enabled Janus-Pro to make significant progress in multimodal understanding and text-to-image instruction tracking, while enhancing the stability of text-to-image generation.
Although Janus-Pro currently can only handle images at a resolution of 384x384, it is impressive that it can achieve such quality given its compact model size.
As a multimodal model, Janus-Pro can not only generate images but also describe them, identify landmarks, recognize text within images, and provide information about the knowledge depicted in the images.
Key Points:
🌟 DeepSeek releases the Janus-Pro multimodal large model, entering the text-to-image field.
📈 In benchmark tests, Janus-Pro-7B outperforms popular models like OpenAI's DALL-E3.
✅ Janus-Pro is licensed under the MIT open-source license, allowing unrestricted use in commercial scenarios.