BipedalWalker-RL
PublicThis project implements agent training using the Proximal Policy Optimization (PPO) algorithm in the BipedalWalker-v3 environment at two difficulty levels: normal and hardcore. The model's performance is evaluated based on rewards collected during the training process.