Eureka

A human-level reward design algorithm implemented by encoding large language models.

CommonProductProgrammingReward DesignReinforcement Learning
Eureka is a human-level reward design algorithm implemented by encoding large language models. It leverages the zero-shot generation, code writing, and context improvement capabilities of state-of-the-art language models (such as GPT-4) to evolve and optimize reward code. The generated rewards can be used to acquire complex skills through reinforcement learning. Eureka-generated reward functions outperform human-expert-designed reward functions in 29 open-source reinforcement learning environments, including 10 diverse robot morphologies. Eureka is also capable of flexibly improving reward functions to enhance the quality and safety of generated rewards. By combining with curriculum learning, using Eureka reward functions, we first demonstrate a simulated Shadow Hand performing a pen-twirling trick, skillfully manipulating the pen at a fast speed within a circle.
Visit

Eureka Visit Over Time

Monthly Visits

2791

Bounce Rate

48.02%

Page per Visit

1.4

Visit Duration

00:02:26

Eureka Visit Trend

Eureka Visit Geography

Eureka Traffic Sources

Eureka Alternatives