SenseTime Unveils New Multimodal Large Model, Shaping the Future of Interaction

AIbase基地

Published inAI News · 4 min read · Apr 10, 2025

At SenseTime's technology exchange day on April 10th, the company unveiled its latest multi-modal large model, "SenseNova V6," and the "SenseCore 2.0" system. This new version aims to integrate text, images, and videos, providing users with a more natural and richer interactive experience.

The SenseNova V6 series includes four versions. The most notable is SenseNova V6Pro, which boasts a 620 billion-parameter hybrid expert architecture, showcasing powerful multi-modal fusion capabilities. SenseNova V6Reasoner Pro enhances multi-modal reasoning capabilities, enabling deeper logical analysis. SenseNova V6Video focuses on video understanding, summarizing and deeply analyzing video content. SenseNova V6Omni is a lightweight, full-modal interactive model combining language, speech, and video for real-time interaction.

Demonstrations showcased SenseNova V6's unique multi-modal capabilities. Users could interact with the model using photos of handwritten math problems; the model not only solved them but also analyzed user answers, guiding users through the solution process via voice, even providing real-time assistance. This makes SenseNova V6 feel like a personal tutor.

SenseTime Technology

SenseTime co-founder, Linda Hua, stated that future interactions will inevitably be multi-modal, and SenseTime aims to master core technologies for these interactions. He noted the relative scarcity of domestic companies developing multi-modal reasoning and interaction capabilities, and SenseTime hopes to leverage its advantages in computer vision to preemptively establish a foothold in the multi-modal large model market.

Furthermore, SenseNova V6Pro's multi-modal capabilities are comparable to leading international models like Gemini 2.0Pro and GPT-4.5. SenseTime emphasizes strong reasoning, strong interaction, and long-term memory as three key technological breakthroughs. These capabilities allow the model to better understand human intent and foster more engaging user interactions.

SenseTime plans to integrate SenseNova V6 into real-world applications across various fields, including education, translation, and tourism. Collaborating with embodied AI company Fourier, SenseTime aims to equip robots with enhanced environmental understanding and human-robot interaction capabilities, truly realizing a more intelligent future.

SenseNova V6 Multimodal Large Model SenseNova V6Pro Mixture-of-Experts Architecture

This article is from AIbase Daily

Welcome to the [AI Daily] column! This is your daily guide to exploring the world of artificial intelligence. Every day, we present you with hot topics in the AI field, focusing on developers, helping you understand technical trends, and learning about innovative AI product applications.

—— Created by the AIbase Daily Team

AI News Recommendations

SenseTime's DayDayUp V6 Released: Multimodal AI Upgraded, API Opens Tomorrow!

SenseTime founder Xu Li recently unveiled DayDayUp V6, their latest generation of AI large model, sparking widespread discussion in the tech community. According to AIbase, DayDayUp V6 achieves significant breakthroughs in multimodal capabilities, further solidifying SenseTime's leading position in the AI field. Even more exciting, the model's API will officially open tomorrow, providing developers with stronger technical support and accelerating the implementation of AI applications. Multimodal capabilities are comprehensively upgraded. DayDayUp V6, as SenseTime's...

Apr 10, 2025

210

Alibaba Unveils its First Multimodal Large Model, Qwen2.5-Omni, Challenging Global Tech Giants

On March 27th, Alibaba launched its first multimodal large model, Qwen2.5-Omni-7B. This model boasts powerful capabilities, handling various input modalities such as text, images, audio, and video, and generating text and natural speech outputs in real-time. This innovative technological breakthrough marks another significant advancement for Alibaba in the field of artificial intelligence. In the authoritative OmniBench multimodal fusion task benchmark, Qwen2.5-Omni achieved...

Mar 27, 2025

1.2k

Baidu Releases Ernie 4.5 and X1 Large Models with Significantly Reduced Prices

Baidu recently launched its latest Ernie 4.5 and Ernie X1 large models, both available for free trial on the Ernie Bot official website. Ernie 4.5, Baidu's first native multimodal large model, excels in multimodal understanding and logical reasoning, outperforming GPT-4.5 in various benchmark tests. Its API price is only 1% of GPT-4.5's, attracting significant attention from developers and businesses. Ernie 4.5 demonstrates remarkable advancements in multimodal understanding, showcasing...

Mar 16, 2025

4.9k

Alibaba Launches New Quark, its Flagship AI Application, Integrating Tongyi Series Models

Alibaba has announced the launch of its flagship AI application, New Quark. Built upon Alibaba Tongyi's leading reasoning and multimodal large models, New Quark is a comprehensively upgraded, boundless "AI Super Box." New Quark boasts powerful reasoning capabilities and enables multimodal interaction, allowing users to interact with AI in real-time. The AI will think and act based on user needs, continuously adjusting strategies to ensure task completion. This innovative design provides users with both convenience and an engaging experience.

Mar 13, 2025

1.5k

Alibaba Launches New Quark, its AI Flagship App, with AI Super Box Upgrade

On March 13th, Alibaba officially launched its AI flagship application—New Quark. This newly upgraded Quark, built upon Alibaba Tongyi's leading reasoning and multimodal large models, creates a boundless AI Super Box, offering users a brand-new AI experience.

Mar 13, 2025

810

Google Open-Sources Next-Generation Multimodal Model Gemma-3: Superior Performance, 10x Lower Cost

Google CEO Sundar Pichai announced at a launch event that Google has open-sourced its latest multimodal large model, Gemma-3. This model is attracting significant attention for its low cost and high performance. Gemma-3 offers four different parameter scale options: 1 billion, 4 billion, 12 billion, and 27 billion parameters. Surprisingly, the largest 27 billion parameter model only requires a single H100 GPU for efficient inference, while similar models often require ten times the computing power.

Mar 13, 2025

290

Huawei Ascend and Step-Star Launch Open-Source Multimodal Model, Entering New AI Territory

Recently, the Modelers community officially launched Step-Video and Step-Audio, two open-source multimodal large models developed by Step-Star. These models are designed for video generation and voice interaction, respectively, aiming to provide developers and enterprise users with more powerful AI tools. Step-Video, formally known as Step-Video-T2V, is a 30-billion parameter model, making it the world's largest open-source video generation model. This model can directly generate 20...

Mar 10, 2025

350

Shenzhen Releases Action Plan for Embodied AI Robot Technological Innovation, Focusing on Multimodal Large Model Construction

Mar 3, 2025

110

Best Performance! Step-Video-T2V Video Generation Model from Step-Chronicles

Today, Step-Chronicles and Geely Automobile Group announced the joint open-source of two models from the Step series of multimodal large models: the Step-Video-T2V video generation model and the Step-Audio voice model. Among them, the Step-Video-T2V video generation model is globally ahead in both parameter count and performance. This model has 30 billion parameters and can directly generate high-quality videos at 540P resolution with 204 frames, ensuring high information density and strong consistency in the generated content. Evaluation results show that

Feb 18, 2025

4.0k

Former Microsoft Vision Expert Hu Han Joins Tencent to Lead Multimodal Large Model Development

Recently, Hu Han, former chief researcher of the visual computing group at Microsoft Research Asia, officially joined Tencent, where he will be responsible for the development of the Hunyuan multimodal large model. This news has attracted widespread attention in the industry, and Hu Han's addition is believed to inject new vitality into Tencent's artificial intelligence initiatives. Hu Han obtained his bachelor's degree from Tsinghua University in 2008 and his Ph.D. in 2014, studying under the renowned professor Zhou Jie. His doctoral thesis was awarded the Excellent Doctoral Dissertation Award by the Chinese Association for Artificial Intelligence in 2016, showcasing his academic achievements.

Jan 8, 2025

1.3k

AI News

AI Daily

AI Timeline

Latest Cases

Image Collection

Video Collection

Audio Collection

Content Collection

Latest Tutorials

AI Product Ranking

AI Traffic Growth Ranking

AI Traffic Decline Ranking

AI Weekly Ranking

United States

China

India

Brazil

Image Generation

Personal Assistant

Character Generation

Video Generation

AI Project Ranking

AI Project Growth Ranking

AI Developer Ranking

AI Organization Ranking

Deepseek

TTS

LLM

ChatGPT

Overview

SenseTime Unveils New Multimodal Large Model, Shaping the Future of Interaction

AIbase基地

This article is from AIbase Daily

AI News Recommendations

SenseTime's DayDayUp V6 Released: Multimodal AI Upgraded, API Opens Tomorrow!

Alibaba Unveils its First Multimodal Large Model, Qwen2.5-Omni, Challenging Global Tech Giants

Baidu Releases Ernie 4.5 and X1 Large Models with Significantly Reduced Prices

Alibaba Launches New Quark, its Flagship AI Application, Integrating Tongyi Series Models

Alibaba Launches New Quark, its AI Flagship App, with AI Super Box Upgrade

Google Open-Sources Next-Generation Multimodal Model Gemma-3: Superior Performance, 10x Lower Cost

Huawei Ascend and Step-Star Launch Open-Source Multimodal Model, Entering New AI Territory

Shenzhen Releases Action Plan for Embodied AI Robot Technological Innovation, Focusing on Multimodal Large Model Construction

Best Performance! Step-Video-T2V Video Generation Model from Step-Chronicles

Former Microsoft Vision Expert Hu Han Joins Tencent to Lead Multimodal Large Model Development