Janus-Pro-1B
Janus-Pro-1B is an autoregressive framework for unified multi-modal understanding and generation.
CommonProductImageMulti-modalImage Generation
Janus-Pro-1B is an innovative multi-modal model that focuses on unified multi-modal understanding and generation. By utilizing separate visual encoding paths, it addresses the conflict seen in traditional methods for understanding and generation tasks, all while maintaining a single unified Transformer architecture. This design not only enhances the model’s flexibility but also ensures outstanding performance across multi-modal tasks, often surpassing models tailored for specific tasks. Built on the DeepSeek-LLM-1.5b-base/DeepSeek-LLM-7b-base architectures, the model employs SigLIP-L as its visual encoder, supports 384x384 image inputs, and utilizes a specialized image generation tokenizer. Its open-source nature and flexibility position it as a strong candidate for next-generation multi-modal models.
Janus-Pro-1B Visit Over Time
Monthly Visits
21315886
Bounce Rate
45.50%
Page per Visit
5.2
Visit Duration
00:05:02