The Ant Group's CodeFuse code large model has open-sourced the ModelCache large model semantic cache to reduce the inference costs of large model applications and enhance user experience. The architecture of ModelCache includes modules such as adapter, embedding, similarity, and data_manager, which can convert text into semantic vector representations and perform similarity sorting and evaluation on these vectors. Online performance statistics of ModelCache indicate that cache hits can reduce average latency by up to 10 times, with a speedup rate of up to 14.5%. In the future, ModelCache will continue to optimize performance and accuracy to improve recall time and precision.