AIbase
Product LibraryTool Navigation

LLM-serving-with-proxy-models

Public

Efficient Interactive LLM Serving with Proxy Model-based Sequence Length Prediction | A tiny BERT model can tell you the verbosity of an LLM (with low latency overhead!)

Creat2024-04-12T22:10:06
Update2025-03-21T11:35:29
33
Stars
0
Stars Increase

Related projects