Mooncake
Leading LLM Service Provider Platform
CommonProductOthersLLM serviceDecoupled architecture
Mooncake is a leading large language model (LLM) service offered by Moonshot AI, serving as the service platform for Kimi. It utilizes a decoupled architecture centered around KVCache, achieving decoupling caching by separating prefill and decoding clusters and leveraging underutilized CPU, DRAM, and SSD resources within GPU clusters. At the heart of Mooncake lies its KVCache-centered scheduler, which balances maximizing overall effective throughput while ensuring compliance with latency-related service level objectives (SLOs). Different from traditional research, Mooncake addresses high-load scenarios by implementing a prediction-based early rejection strategy. Experiments demonstrate that Mooncake excels in long-context scenarios, achieving up to a 525% throughput increase compared to baseline methods in certain simulated environments while adhering to SLOs. Under real-world workloads, Mooncake's innovative architecture enables Kimi to handle over 75% of requests.
Mooncake Visit Over Time
Monthly Visits
494758773
Bounce Rate
37.69%
Page per Visit
5.7
Visit Duration
00:06:29