Harbin Institute of Technology (Shenzhen) has released JiuTian, a multimodal large model, which achieved a 5% performance improvement across 13 vision-language tasks. JiuTian addresses the shortcomings of traditional models in extracting visual information by integrating spatial awareness and semantic visual knowledge. The new method framework includes segmented instruction fine-tuning strategies and hybrid adapters, effectively enhancing visual understanding capabilities. Paper link: https://arxiv.org/abs/2311.11860, GitHub: https://github.com/rshaojimmy/JiuTian.
HIT Deep Releases Multi-modal Large Model Jiutian, Performance Improved by 5%

站长之家
This article is from AIbase Daily
Welcome to the [AI Daily] column! This is your daily guide to exploring the world of artificial intelligence. Every day, we present you with hot topics in the AI field, focusing on developers, helping you understand technical trends, and learning about innovative AI product applications.