2024-09-02 11:17:38.AIbase.11.5k
NVIDIA Launches New Visual Speech Model NVEagle, Capable of Chatting with Images
NVIDIA has collaborated with several universities to introduce NVEagle, a large visual language model capable of chatting using images. NVEagle can analyze image content and provide accurate answers, such as identifying individuals in images, like Jensen Huang. The model significantly enhances the understanding of visual information by transforming images into visual tokens and combining them with text embeddings. In addressing the challenges of high-resolution image processing, the research team has constructed models like Eagle-X5-7B and Eagle-X by exploring various visual encoders and fusion strategies.