Eureka
A human-level reward design algorithm implemented by encoding large language models.
CommonProductProgrammingReward DesignReinforcement Learning
Eureka is a human-level reward design algorithm implemented by encoding large language models. It leverages the zero-shot generation, code writing, and context improvement capabilities of state-of-the-art language models (such as GPT-4) to evolve and optimize reward code. The generated rewards can be used to acquire complex skills through reinforcement learning. Eureka-generated reward functions outperform human-expert-designed reward functions in 29 open-source reinforcement learning environments, including 10 diverse robot morphologies. Eureka is also capable of flexibly improving reward functions to enhance the quality and safety of generated rewards. By combining with curriculum learning, using Eureka reward functions, we first demonstrate a simulated Shadow Hand performing a pen-twirling trick, skillfully manipulating the pen at a fast speed within a circle.
Eureka Visit Over Time
Monthly Visits
5581
Bounce Rate
45.86%
Page per Visit
1.6
Visit Duration
00:01:32