Harbin Institute of Technology (Shenzhen) has released JiuTian, a multimodal large model, which achieved a 5% performance improvement across 13 vision-language tasks. JiuTian addresses the shortcomings of traditional models in extracting visual information by integrating spatial awareness and semantic visual knowledge. The new method framework includes segmented instruction fine-tuning strategies and hybrid adapters, effectively enhancing visual understanding capabilities. Paper link: https://arxiv.org/abs/2311.11860, GitHub: https://github.com/rshaojimmy/JiuTian.