Yesterday, Kimi Open Platform under the Moon's Dark Side announced the public test of Context Caching technology. This technology can reduce the cost of using large text flagship models by up to 90% for developers, without changing the API price, and significantly improve the response speed of the model.
Context Caching is an efficient data management technology that allows the system to pre-store large amounts of data or information that may be frequently requested. This way, when you request the same information again, the system can quickly provide it from the cache without the need for recalculating or retrieving from the original data source, thus saving time and resources. Context Caching is particularly suitable for scenarios with frequent requests and repeated references to a large amount of initial context, which can significantly reduce the cost of long text models and improve efficiency!
In particular, "Context Caching" can be applied to scenarios with frequent requests and repeated references to a large amount of initial context, bringing the following two effects:
Cost reduction up to 90%: For scenarios that require a large number of questions for fixed documents, Context Caching can save a lot of costs. For example, for a hardware product manual with about 90,000 words, pre-sales support staff need to perform multiple question and answer sessions in a short time. After connecting to Context Caching, the cost can be reduced to about 10% of the original.
First Token delay reduced by 83%: For a request to a 128k model, it usually takes 30 seconds to return the first Token. With Context Caching, the average first Token delay can be reduced to within 5 seconds, a reduction of about 83% in latency time.
The charging model of Context Caching mainly includes the following three parts:
Cache Creation Fee:
After calling the Cache creation interface and successfully creating the Cache, it is charged according to the actual number of Tokens in the Cache. 24 yuan/M token
Cache Storage Fee:
The Cache storage fee is charged per minute during the Cache survival time. 10 yuan/M token/minute
Cache Call Fee:
The charging for additional Tokens called by Cache: charged at the original price of the model
Cache Call Times Fee:
During the Cache survival time, if the chat message content matches the existing Cache successfully through the chat interface, the Cache call fee will be charged based on the number of calls. 0.02 yuan per call