Mooncake

Leading LLM Service Provider Platform

CommonProductOthersLLM serviceDecoupled architecture

Mooncake is a leading large language model (LLM) service offered by Moonshot AI, serving as the service platform for Kimi. It utilizes a decoupled architecture centered around KVCache, achieving decoupling caching by separating prefill and decoding clusters and leveraging underutilized CPU, DRAM, and SSD resources within GPU clusters. At the heart of Mooncake lies its KVCache-centered scheduler, which balances maximizing overall effective throughput while ensuring compliance with latency-related service level objectives (SLOs). Different from traditional research, Mooncake addresses high-load scenarios by implementing a prediction-based early rejection strategy. Experiments demonstrate that Mooncake excels in long-context scenarios, achieving up to a 525% throughput increase compared to baseline methods in certain simulated environments while adhering to SLOs. Under real-world workloads, Mooncake's innovative architecture enables Kimi to handle over 75% of requests.

Product Finder

Product Submit

AI Models Finder

MCP Servers

MCP Client

MCP Inspector

Case Tutorials

Latest AI News

AI Daily Brief

Mooncake

Mooncake Visit Over Time

Mooncake Visit Trend

Mooncake Visit Geography

Mooncake Traffic Sources

Mooncake Alternatives

Mooncake — Leading LLM Service Provider Platform

Efficient LLM — An efficient solution for LLM inference on Intel GPUs.

Architecture Insights — Weekly Free AI News Digest

LLM Context Extender — Extends LLM context window

LLM Logs — A blog that helps you become an LLM expert.

llm.c — Utilizes simple C/CUDA for LLM training.

LLM Spark — A development platform for building LLM applications

ComfyUI LLM Party — A collection of LLM workflow nodes developed based on the ComfyUI frontend.

GitHub to LLM Converter — Convert GitHub links into a format suitable for LLM.

Awan LLM — An unlimited token, unrestricted, cost-effective LLM inference API platform.

Yuque AI Customer Service — A data-driven AI customer service platform that supports multi-channel online customer service, helping businesses cut costs and increase efficiency.

WHALESPEAK Intelligent Customer Service — An AI-powered intelligent customer service system providing 24/7 uninterrupted service.

llm-commit — Un plugin para generar mensajes de commit de Git con LLM

Tencent Enterprise Customer Service — It provides an all-encompassing intelligent online customer service system that supports integrated communication across multiple channels, thereby enhancing the efficiency of corporate customer service.

vLLM — Fast and Easy-to-Use LLM Inference and Serving Platform

Douhui AI — AI Architecture Design _ AI Interior Design _ AI Rendering

AIchatbot For Customer Service — Create your own AI customer service chatbot to solve 90% of your support issues.

LangSmith — LLM Application Developer Platform

Ebay Customer Service Helper with GPT — Offers an eBay seller customer service assistant using GPT for template response generation.

Prompt Joy — An MLops tool for recording and testing LLM prompts

Awesome-LLM-Nachtraining — Ein Tutorial-, Untersuchungs- und Leitfaden-Repository zu Methoden des Nachtrainings großer Sprachmodelle (LLM).

Skywork-Reward-Gemma-2-27B — An advanced reward model based on the Gemma-2-27B architecture

ArchitectAI — The world's first architecture AI, interior AI, and landscape AI.

Crawl4LLM — An efficient web crawler for LLM pre-training, focused on crawling high-quality web data effectively.

LLM Pricing — Compares pricing information for various large language models (LLMs)

RWKV — The new generation of large-scale model architecture, surpassing transformer.

Cool Cat Cloud AI Intelligent Customer Service Robot — A smart customer service solution tailored for small and medium-sized enterprises.

Xunfei A.I. Intelligent Customer Service Solution — A multi-channel intelligent customer service solution based on科大讯飞speech technology.

ReDrafter — Innovative technology for accelerating LLM inference on NVIDIA GPUs