Recently, Zhipu Technology announced the open-source release of its edge-side large language and multimodal model series, GLM-Edge. This initiative marks an important attempt by the company to implement real-world use cases on edge devices. The GLM-Edge series consists of four models of different sizes: GLM-Edge-1.5B-Chat, GLM-Edge-4B-Chat, GLM-Edge-V-2B, and GLM-Edge-V-5B, which are optimized for mobile platforms such as smartphones and car systems, as well as desktop platforms like PCs.

Zhipu AI

Building on the technological foundation of the GLM-4 series, Zhipu's research team has adjusted the model structure and size to achieve the best balance between model performance, real-time inference effects, and ease of deployment. Through in-depth collaboration with partners and optimization of inference, the GLM-Edge series models have demonstrated exceptional operating speeds on certain edge platforms. Notably, on the Qualcomm Snapdragon 8 Elite platform, leveraging NPU computing power and a mixed quantization scheme, the 1.5B chat model and the 2B multimodal model can achieve decoding speeds of over 60 tokens per second. With the application of speculative sampling techniques, the decoding speed can exceed 100 tokens per second.

The open-source GLM-Edge series models not only showcase the company's technological prowess in the field of artificial intelligence but also provide developers and researchers with powerful tools and resources to promote the development and innovation of edge AI applications.

GLM-Edge Collection:

https://modelscope.cn/collections/GLM-Edge-ff0306563d2844