At the 2024 World Artificial Intelligence Conference, SenseTime Technology released the first domestic "What You See Is What You Get" model, named "Ri Ri Xin 5o". This model offers an interactive experience comparable to GPT-4o, achieving real-time streaming of multi-modal interactions. By integrating cross-modal information such as sound, text, images, and video, it can understand and respond in real-time. For example, it can recognize the name tags worn by staff, determine the venue location, describe the appearance and attire of a cute plush toy, and provide instant evaluations of the paintings drawn by staff on the spot.
The real-time interactive capabilities of the "Ri Ri Xin 5o" model are particularly suitable for applications such as real-time conversations and voice recognition. It can handle multiple tasks within the same model and adapt its behavior and output to different contexts. This model is based on the "Ri Ri Xin 5.5" foundational model, which is an upgraded version of the "Ri Ri Xin 5.0" released in April this year. Its comprehensive performance has been improved by an average of 30%, with significant improvements in mathematical reasoning, English proficiency, and command following, among other aspects.
"Ri Ri Xin 5.5" adopts a hybrid edge-cloud collaborative expert architecture. It enhances the model's reasoning ability by training with over 10TB of high-quality data, including synthetic reasoning chain data. To lower the entry barrier for corporate users, SenseTime has launched the "Big Model 0 Yuan Go" program, offering multiple free services to new registrants and providing a 50 million Tokens package. Additionally, it offers exclusive migration consultants to assist OpenAI users in the transition with zero service costs.