Tencent Cloud announced a significant price reduction for DeepSeek-V4 model calls starting June 3, 2026, aligning with official prices. DeepSeek-V4-Pro cache hit price drops up to 97.5%, with inference input and output prices reduced by 75%; DeepSeek-V4-Flash cache hit price also decreases by 90%.....
MiniMax M3 model officially launched, integrated with JD Cloud JoyBuilder platform. Key highlight: significantly improved inference performance via self-developed framework, PD separation, KV Cache, and speculative sampling for enhanced efficiency.....
DeepSeek officially announced that the API price of its DeepSeek-V4-Pro model will be permanently reduced to one-quarter of the original price after the limited-time promotion ends on May 31, 2026. Previously, the model had already introduced a full set of API price adjustments on April 26, with input cache hit prices reduced to one-tenth of the launch price, combined with a limited-time 2.5 discount. This adjustment makes the low price a standard, setting a new record for global large model API pricing.
AMD released the vLLM-ATOM plugin, aiming to fully tap into hardware potential without changing the existing workflow, significantly accelerating the inference of mainstream large language models such as DeepSeek-R1 and Kimi-K2. vLLM is an open-source framework optimized for throughput and GPU memory utilization in high-concurrency scenarios, focusing on request scheduling and cache management. The ATOM plugin further enhances this capability.
Enterprise-level web scraping API, zero obstruction, lightning speed, 30 times faster cache, 99.9% uptime.
An enhancement method for language models that improves generation efficiency through preloading knowledge caches without the need for real-time retrieval.
Moonshot
$10
Input tokens/M
$30
Output tokens/M
131
Context Length
redis
This is a cross-encoder model fine-tuned on the LangCache sentence pair dataset using the sentence-transformers library, based on the Alibaba-NLP/gte-reranker-modernbert-base model. It is specifically designed to calculate the semantic similarity score between text pairs, aiming to provide efficient text matching and reordering capabilities for the LangCache semantic cache system.
This is a CrossEncoder model developed by Redis and fine-tuned for the LangCache semantic caching task. It is based on the mature `cross-encoder/ms-marco-MiniLM-L6-v2` model and trained on a dataset of over 1 million LangCache sentence pairs. It is specifically designed to calculate the semantic relevance score between two texts to optimize the cache hit rate.
aditeyabaral-redis
This is a CrossEncoder model fine-tuned based on the Alibaba-NLP/gte-reranker-modernbert-base model, specifically optimized for the Redis LangCache semantic cache scenario, which can effectively calculate the semantic similarity scores of text pairs.
This is a cross-encoder model fine-tuned based on Alibaba-NLP/gte-reranker-modernbert-base, specifically optimized for the Redis LangCache semantic cache scenario. This model can effectively calculate the semantic similarity scores between text pairs for sentence pair classification and semantic matching tasks.
sudeshmu
A 360-million-parameter language model based on the LLaMA architecture and using MoR (Mixture of Recursions) technology, fine-tuned on the FineWeb-Edu deduplicated dataset, achieving efficient text generation capabilities through a dynamic routing mechanism and recursive KV cache.
RedHatAI
The Meta-Llama-3-8B-Instruct model has undergone per-tensor quantization of FP8 weights and activations, suitable for inference with vLLM >= 0.5.0. This model checkpoint also includes per-tensor scaling parameters for FP8 quantized KV cache.
nvidia
Cache-aware FastConformer-Hybrid large model supporting multiple look-ahead windows, specifically designed for streaming automatic speech recognition, adaptable to various latency scenarios
Memory MCP is an MCP server that provides persistent memory for AI assistants. Through a two - tier architecture of hot cache and cold storage, it enables zero - latency automatic injection of high - frequency knowledge and semantic search, allowing Claude to remember project contexts and reduce repeated explanations.
An MCP service that reduces token consumption in language model interactions through efficient data caching
An MCP service that reduces token consumption in language model interactions through a caching mechanism
An MCP server for accessing the Visual Novel Database (VNDB), providing visual novel search and detailed information query functions, and featuring API request cache optimization.
SEO MCP is an SEO tool service based on Ahrefs data, offering functions such as backlink analysis, keyword research, and traffic estimation. It obtains and caches data through the API and supports automatic captcha cracking.
myAI Memory Sync is a cross - platform memory sync tool designed for Claude. It realizes unified management of user preference settings and personal information through the local - first MCP protocol, and supports natural language interaction and high - speed cache queries.
An Excel file processing server based on the MCP protocol, providing functions for reading, writing, and analyzing Excel files, supporting multi - worksheet operations, cache management, and log management.
Fastly MCP is a tool that integrates Fastly API features into AI assistants through the Model Context Protocol (MCP), allowing users to manage CDN services, caches, security configurations, etc. through natural language instructions while ensuring the security of API keys.
Java - based MCP service for Excel operations, providing efficient reading, cache support, and multi - workspace management functions
The Magento 2 Development MCP Server provides development functions such as module management, system diagnosis, cache configuration, and database tools for AI agents
An MCP server based on the Crossref API for intelligent parsing of academic paper citations, supporting output in multiple citation formats, including CSL-JSON, BibTeX, RIS, and formatted text, with built-in cache and retry mechanisms.
An MCP server integrating the Agent.ai API, providing functions for web page text extraction, web page screenshot capture, and YouTube subtitle acquisition, supporting dynamic function loading and a cache mechanism.
The Maven Indexer MCP Server provides a tool for AI agents to search for Java classes, method signatures, and source code by indexing the local Maven repository and Gradle cache, especially suitable for understanding the code of internal private libraries and less - known public libraries.
A privacy - friendly web search server based on the MCP protocol, providing multi - engine parallel search functions for web pages, social media, and archives, supporting cache management and security verification.
A Python - based MCP server that provides functions for reading, searching, and extracting content from PDF documents. It supports paginated reading, full - text search, and image extraction, and uses an SQLite cache for persistent storage.
An MCP server project based on the SWAPI Star Wars API, providing query functions for Star Wars character, planet, movie, etc. data, supporting automatic pagination and cache management, and can be integrated with tools such as VS Code.
godoc-mcp-server is a tool for searching Go language packages and their documentation. It retrieves information from pkg.go.dev and serves as an MCP server for LLMs. It supports local cache, multi - platform distribution, and provides detailed parameter descriptions to optimize the LLM interaction experience.
A memory cache server based on the MCP protocol that reduces token consumption by efficiently caching language model interaction data, supporting automatic management and configuration optimization.
The MCP server of the Mamont search engine, providing search and cache functions.
The CCXT MCP Server is a high - performance cryptocurrency exchange integration service based on the Model Context Protocol (MCP) and the CCXT library. It supports connecting to more than 20 exchanges, provides unified API access to various market types such as spot and futures, and has features such as proxy configuration, cache optimization, and rate limiting.