Kunlun Wanwei has officially released Skywork R1V (referred to as "R1V"), the world's first open-source industrial-grade multimodal reasoning model. This 3.8-billion parameter model's performance is comparable to the well-known closed-source model DeepSeek-R1, even surpassing it in several benchmark tests, outperforming a range of state-of-the-art (SOTA) technologies. Kunlun Wanwei's decision to open-source R1V aims to promote technological sharing and advancement, injecting new vitality into the global AI open-source community.
R1V is renowned for its exceptional multimodal reasoning capabilities, seamlessly integrating text and visual information to demonstrate powerful intelligence. Specifically, R1V directly benchmarks against closed-source models like Claude3.5Sonnet and GPT-4o in visual question answering tasks, while maintaining top-tier text reasoning capabilities. In the MMMU benchmark test, R1V achieved a high score of 69, setting a new record for models of its size. It also achieved an excellent score of 67.5 in the MathVista test, showcasing its powerful capabilities in complex mathematical reasoning and logical analysis.
R1V's success is attributed to several innovative technologies developed by Kunlun Wanwei's research team. These include cross-modal transfer learning, a method that effectively transfers the large model's text reasoning capabilities to the visual modality, significantly reducing the need for multimodal reasoning data. Furthermore, R1V employs a hybrid training strategy that combines iterative supervised fine-tuning and reinforcement learning to dynamically adjust the chain-of-thought length, thereby improving reasoning efficiency. Notably, R1V also introduces an adaptive length chain-of-thought distillation framework to avoid "overthinking" during the reasoning process, significantly improving both efficiency and quality.
With the launch of R1V, Kunlun Wanwei has not only become the world's first company to open-source a multimodal reasoning model but has also taken a significant step towards realizing the dream of AGI (Artificial General Intelligence). The model weights, inference code, and technical report are all publicly available, and anyone can access the relevant resources via GitHub and Hugging Face.
Model Weight Download
Hugging Face:
https://huggingface.co/Skywork/Skywork-R1V-38B
GitHub:
https://github.com/SkyworkAI/Skywork-R1V
Detailed Technical Report
https://github.com/SkyworkAI/Skywork-R1V/blob/main/Skywork_R1V.pdf
Key Highlights:
🌟 The world's first open-source industrial-grade multimodal reasoning model, Skywork R1V, has been officially released, boasting 3.8 billion parameters.
🚀 R1V demonstrates exceptional performance in multiple benchmark tests, achieving impressive scores of 69 and 67.5 in MMMU and MathVista, respectively.
📚 Kunlun Wanwei's open-source initiative aims to promote technology sharing, inject vitality into the global AI open-source community, and contribute to the realization of the AGI dream.