Kimi Open Platform Launches Public Beta of Context Cache, Reducing Long Text Model Costs by 90%

AIbase

Published inAI News · 5 min read · Jul 2, 2024

139

Yesterday, Kimi Open Platform under the Moon's Dark Side announced the public test of Context Caching technology. This technology can reduce the cost of using large text flagship models by up to 90% for developers, without changing the API price, and significantly improve the response speed of the model.

Context Caching is an efficient data management technology that allows the system to pre-store large amounts of data or information that may be frequently requested. This way, when you request the same information again, the system can quickly provide it from the cache without the need for recalculating or retrieving from the original data source, thus saving time and resources. Context Caching is particularly suitable for scenarios with frequent requests and repeated references to a large amount of initial context, which can significantly reduce the cost of long text models and improve efficiency!

WeChat Screenshot_20240702081354.png

In particular, "Context Caching" can be applied to scenarios with frequent requests and repeated references to a large amount of initial context, bringing the following two effects:

Cost reduction up to 90%: For scenarios that require a large number of questions for fixed documents, Context Caching can save a lot of costs. For example, for a hardware product manual with about 90,000 words, pre-sales support staff need to perform multiple question and answer sessions in a short time. After connecting to Context Caching, the cost can be reduced to about 10% of the original.

First Token delay reduced by 83%: For a request to a 128k model, it usually takes 30 seconds to return the first Token. With Context Caching, the average first Token delay can be reduced to within 5 seconds, a reduction of about 83% in latency time.

The charging model of Context Caching mainly includes the following three parts:

Cache Creation Fee:

After calling the Cache creation interface and successfully creating the Cache, it is charged according to the actual number of Tokens in the Cache. 24 yuan/M token

Cache Storage Fee:

The Cache storage fee is charged per minute during the Cache survival time. 10 yuan/M token/minute

Cache Call Fee:

The charging for additional Tokens called by Cache: charged at the original price of the model

Cache Call Times Fee:

During the Cache survival time, if the chat message content matches the existing Cache successfully through the chat interface, the Cache call fee will be charged based on the number of calls. 0.02 yuan per call

Upper and lower text compression Kimi open platform Moon's dark side Long text banner large model

This article is from AIbase Daily

Welcome to the [AI Daily] column! This is your daily guide to exploring the world of artificial intelligence. Every day, we present you with hot topics in the AI field, focusing on developers, helping you understand technical trends, and learning about innovative AI product applications.

—— Created by the AIbase Daily Team

Product Finder

Product Submit

AI Models Finder

MCP Servers

MCP Client

MCP Inspector

Case Tutorials

Latest AI News

AI Daily Brief

Kimi Open Platform Launches Public Beta of Context Cache, Reducing Long Text Model Costs by 90%

AIbase

This article is from AIbase Daily