The research team has announced an exciting advancement: they have developed a new neural network called HOVER (Humanoid Omnipotent Versatile Controller). This neural network has 1.5 million parameters and is specifically designed to coordinate the movements and operations of humanoid robots.

image.png

Jim Fan, Senior Research Manager at NVIDIA, stated: "Not all foundational models need to be large. The 1.5M parameter neural network we trained is designed to control the body of humanoid robots." He further explained that HOVER can capture the subconscious processes in human movements, allowing robots to perform complex tasks without tedious programming. He mentioned, "Humans require a lot of subconscious processing when walking, maintaining balance, and dexterously controlling their limbs."

During the training process, HOVER utilized NVIDIA's Isaac simulation platform, which accelerates physical simulations at a speed 10,000 times faster than real-time.

Jim Fan revealed that the model was trained in a virtual environment for a year, which actually took only about 50 minutes of real time, completed on a single GPU. He stated that this efficient training allows the neural network to smoothly transition to real-world applications without the need for fine-tuning.

HOVER has the capability to respond to various high-level motion commands, including using XR devices (such as Apple's Vision Pro) for head and hand pose control, or obtaining full-body poses through motion capture and RGB cameras, or even receiving joint angles from exoskeletons or root speed commands from joysticks. Fan emphasized that HOVER provides a unified interface for controlling robots with different input devices, facilitating the collection of teleoperation data for training.

Additionally, HOVER integrates with upstream vision-language-action models, enabling high-frequency motion commands to be translated into low-level motor signals. This model is compatible with any humanoid robot that can be simulated in Isaac, allowing users to easily bring robots to life.

Earlier this year, NVIDIA also announced a project called GR00T, a universal foundational model designed specifically for humanoid robots. The robots driven by GR00T (Generalist Robot00Technology) can understand natural language and mimic human movements by observing actions, enabling them to quickly learn coordination, flexibility, and other skills necessary for effective interactions in the real world.

Paper URL: https://arxiv.org/pdf/2410.21229

Key Points:

- 🤖 NVIDIA introduces HOVER, a 1.5 million parameter neural network designed to control the movements and operations of humanoid robots.

- ⏳ HOVER was trained in a virtual environment for a year, with the actual training time being only 50 minutes, enhancing the efficiency for real-world applications.

- 🎮 HOVER supports various high-level motion commands, works with different input devices, and provides a unified interface for robot control.