On January 15, 2025, Beijing Moon's Dark Side Technology Co., Ltd. announced the official release of its new multimodal image understanding model, moonshot-v1-vision-preview. This model enhances the multimodal capabilities of the moonshot-v1 model series, helping Kimi better understand the world.

The Vision model possesses powerful image recognition capabilities, accurately identifying complex details and subtle differences in images. Whether it’s food or animals, it can distinguish similar yet different objects. For example, when presented with 16 similar images of blueberry muffins and Chihuahuas, which are difficult for the human eye to differentiate, the Vision model can precisely distinguish and recognize them.

The Vision model also boasts leading advanced image recognition capabilities in the country, excelling in OCR text recognition and image understanding scenarios. It is more accurate than standard document scanning and OCR recognition software, capable of recognizing messy handwritten content such as receipts and delivery slips.

WeChat Screenshot_20250115135433.png

The Vision model supports features such as multi-turn dialogue, streaming output, tool invocation, JSON Mode, and Partial Mode. However, it does not currently support online search, nor does it support creating Context Cache with image content. It does allow the use of successfully created Cache to invoke the Vision model and does not support images in URL format, currently only supporting images in base64 encoding.

Model Pricing

Model Billing Unit Price
moonshot-v1-8k-vision-preview 1M tokens ¥12.00
moonshot-v1-32k-vision-preview 1M tokens ¥24.00
moonshot-v1-128k-vision-preview 1M tokens ¥60.00