PowerInfer-2

An efficient large language model inference framework designed specifically for smartphones

CommonProductProgrammingSmartphoneLarge Model
PowerInfer-2 is a mobile-optimized inference framework that supports MoE models up to 47B parameters, achieving an inference speed of 11.68 tokens per second, 22 times faster than other frameworks. It utilizes heterogeneous computing and I/O-Compute pipeline technology to significantly reduce memory usage and improve inference speed. This framework is suitable for scenarios requiring the deployment of large models on mobile devices, enhancing data privacy and performance.
Visit

PowerInfer-2 Alternatives