LEO
All-in-one Agent in the 3D World
CommonProductImageArtificial Intelligence3D World
LEO is a multimodal, multi-task all-in-one agent based on a large language model, capable of perceiving, localizing, reasoning, planning, and executing tasks in the 3D world. LEO achieves this through two stages of training: (i) 3D visual-language alignment and (ii) 3D visual-language action instruction tuning. We carefully curated and generated a large-scale dataset with object-level and scene-level multimodal tasks, requiring deep understanding and interaction with the 3D world. Through rigorous experiments, we demonstrate LEO's outstanding performance across a wide range of tasks, including 3D captioning, question answering, reasoning, navigation, and robot manipulation."
LEO Visit Over Time
Monthly Visits
43
Bounce Rate
50.86%
Page per Visit
1.0
Visit Duration
00:00:00