Mooncake

Leading LLM Service Provider Platform

CommonProductOthersLLM serviceDecoupled architecture
Mooncake is a leading large language model (LLM) service offered by Moonshot AI, serving as the service platform for Kimi. It utilizes a decoupled architecture centered around KVCache, achieving decoupling caching by separating prefill and decoding clusters and leveraging underutilized CPU, DRAM, and SSD resources within GPU clusters. At the heart of Mooncake lies its KVCache-centered scheduler, which balances maximizing overall effective throughput while ensuring compliance with latency-related service level objectives (SLOs). Different from traditional research, Mooncake addresses high-load scenarios by implementing a prediction-based early rejection strategy. Experiments demonstrate that Mooncake excels in long-context scenarios, achieving up to a 525% throughput increase compared to baseline methods in certain simulated environments while adhering to SLOs. Under real-world workloads, Mooncake's innovative architecture enables Kimi to handle over 75% of requests.
Visit

Mooncake Visit Over Time

Monthly Visits

515580771

Bounce Rate

37.20%

Page per Visit

5.8

Visit Duration

00:06:42

Mooncake Visit Trend

Mooncake Visit Geography

Mooncake Traffic Sources

Mooncake Alternatives