Developed in collaboration by Stanford University and the University of California, Berkeley, the S-LoRA technology reduces the cost of LLM fine-tuning, enabling enterprises to run thousands of models on a single GPU. S-LoRA addresses the technical challenges of running multiple LoRA models on a single GPU through a dynamic memory management system and a 'Unified Paging' mechanism. In evaluations, S-LoRA outperformed Hugging Face PEFT, achieving a 30-fold increase in throughput and successfully serving 20 models simultaneously.