Byte's Large Model New Progress: First Time Introducing Visual Localization for Fine-Grained Multimodal Joint Understanding, Now Open Source & Demo Available

ByteDance's Doubao large model team recently announced a breakthrough in addressing key bottlenecks in Mixture-of-Experts (MoE) architecture, open-sourcing a significant optimization technology called COMET. This technology dramatically improves large model training efficiency, achieving a remarkable 1.7x speedup and a 40% reduction in training costs. Image Note: Image generated by AI, image licensing provider Midjourney. COMET has been deployed in ByteDance's multi-thousand-GPU cluster training, resulting in millions of GPU hours saved.
Recently, the Modelers community officially launched Step-Video and Step-Audio, two open-source multimodal large models developed by Step-Star. These models are designed for video generation and voice interaction, respectively, aiming to provide developers and enterprise users with more powerful AI tools. Step-Video, formally known as Step-Video-T2V, is a 30-billion parameter model, making it the world's largest open-source video generation model. This model can directly generate 20...
Baidu Research has unveiled BGE-VL, a groundbreaking multimodal vector model poised to revolutionize information retrieval. This advanced model promises significant improvements in search accuracy and efficiency.