Translated data: The Medusa open-source project, co-authored by alumni of Peking University's School of Mathematics, is a method that accelerates large model inference by adding more decoding heads. Compared to traditional speculative sampling, Medusa significantly improves inference accuracy, achieving up to 60%. The co-lead author of the research is Yuhong (Jesse) Li from Peking University's School of Mathematics, who specializes in efficient machine learning. Additionally, the development team behind Medusa includes Tri Dao, the author of FlashAttention. This method holds promise for expediting large model inference without the need for additional model training or hardware optimization.