2023-08-15 13:55:30.AIbase.508
Byte's Large Model New Progress: First Time Introducing Visual Localization for Fine-Grained Multimodal Joint Understanding, Now Open Source & Demo Available
Byte's large model BuboGPT supports fine-grained multimodal joint understanding of text, images, and audio. By introducing a visual localization method, BuboGPT can accurately locate objects in images. Researchers have utilized a multimodal instruction tuning training scheme, achieving good results on multimodal tasks.