The SiliconCloud platform is pleased to announce the official launch of batch inference capabilities for DeepSeek-R1 & V3API. Users can now send requests to SiliconCloud via batch API, eliminating the constraints of real-time inference rates and completing large-scale data processing tasks within an expected 24 hours.

A major highlight of this update is a significant price reduction. The price of DeepSeek-V3 batch inference is 50% lower than real-time inference. Even better, from March 11th to March 18th, DeepSeek-R1 batch inference enjoys a 75% discount, with input costing only ¥1/million Tokens and output costing ¥4/million Tokens.

QQ20250312-163818.png

The introduction of batch inference aims to help users process large-scale data processing tasks, such as generating reports and data cleaning, more efficiently and at a lower cost. This feature is particularly suitable for data analysis and model performance evaluation scenarios that do not require real-time responses.

It's worth mentioning that DeepSeek-R1 & V3API previously added support for Function Calling, JSON Mode, Prefix, and FIM. Furthermore, the TPM (Tokens Per Minute) limit for the Pro version of DeepSeek-R1 & V3API has been increased from 10,000 to 1,000,000.