The Dark Side of the Moon today announced the release of a brand new visual thinking model, k1. This model is based on reinforcement learning technology, supporting end-to-end image understanding while integrating chain-of-thought techniques. It expands its capabilities beyond mathematics into more fundamental scientific fields, including physics and chemistry. In benchmark capability tests, the k1 model outperformed leading global benchmark models, such as OpenAI's o1, GPT-4o, and Claude3.5Sonnet.
The new generation model encourages the generation of more detailed reasoning steps, forming high-quality chains of thought that significantly improve the success rate of solving complex tasks. Kimi's k1 model integrates image understanding and reasoning abilities, providing users with a smoother interaction experience, capable of directly processing user-inputted image information and deriving answers without relying on external OCR or additional visual models.
The training of the k1 model is divided into two stages: first, a pre-training phase to obtain a base model, followed by reinforcement learning to enhance the model. The base model achieved an impressive score of 903 on OCRBench and excelled in benchmark test sets such as MathVista-testmini, MMMU-val, and DocVQA. The reinforcement learning phase optimized data quality and learning efficiency, achieving new breakthroughs in scalability.
Kimi has also independently built a standardized test set called Science Vista, which covers image-based questions in mathematics, physics, and chemistry of varying difficulties, and will be made available for use across the industry. Although the k1 model has shown some limitations in internal testing, such as the need for improvement in out-of-distribution generalization and the success rate on complex problems, its performance in visually noisy scenarios surpasses other models, demonstrating exceptional visual recognition capabilities.
Kimi's intelligent assistant's k1 visual thinking model not only excels in mathematics but also extends to the fields of physics and chemistry, showcasing a broad range of foundational scientific abilities. Additionally, the k1 model exhibits general capabilities, able to explain and reason about non-mathematical issues, such as the content and background stories of scientists' manuscripts.
Kimi's intelligent assistant looks forward to exploring a larger world with users. The new k1 model is now live, and users can experience this new feature through the latest version of the Kimi Intelligent Assistant mobile app or the web version.