LEO

All-in-one Agent in the 3D World

CommonProductImageArtificial Intelligence3D World
LEO is a multimodal, multi-task all-in-one agent based on a large language model, capable of perceiving, localizing, reasoning, planning, and executing tasks in the 3D world. LEO achieves this through two stages of training: (i) 3D visual-language alignment and (ii) 3D visual-language action instruction tuning. We carefully curated and generated a large-scale dataset with object-level and scene-level multimodal tasks, requiring deep understanding and interaction with the 3D world. Through rigorous experiments, we demonstrate LEO's outstanding performance across a wide range of tasks, including 3D captioning, question answering, reasoning, navigation, and robot manipulation."
Visit

LEO Visit Over Time

Monthly Visits

145

Bounce Rate

43.07%

Page per Visit

1.0

Visit Duration

00:00:00

LEO Visit Trend

LEO Visit Geography

LEO Traffic Sources

LEO Alternatives