Beijing Zhipu Huazhang Technology Co., Ltd. announced that its Zhipu Open Platform BigModel has launched the first free multimodal API—GLM-4V-Flash. This new model leverages the excellent capabilities of the 4V series, achieving improved accuracy in image processing and further lowering the barriers for developers to delve deeper into large models across various fields.
Zhipu Technology recently announced the open source of its end-side large language and multimodal model GLM-Edge series, marking an important attempt by the company in real-world use cases at the end side. The GLM-Edge series consists of four different model sizes, including GLM-Edge-1.5B-Chat, GLM-Edge-4B-Chat, GLM-Edge-V-2B, and GLM-Edge-V-5B, which are optimized for mobile platforms such as smartphones and vehicle systems, as well as desktop platforms like PCs.
Recently, research teams from Peking University announced the release of an open-source multimodal model called LLaVA-o1, which is claimed to be the first visual language model capable of spontaneous and systematic reasoning, comparable to GPT-o1. The model excels in six challenging multimodal benchmark tests, with its 11B parameter version outperforming competitors such as Gemini-1.5-pro, GPT-4o-mini, and Llama-3.2-90B-Vision-Instruct.