Greedy-Adv-Aware-RLHF
PublicGAA is a modification of the RLHF PPO loop that addresses the 'negative side effects from misspecified reward functions' problem
GAA is a modification of the RLHF PPO loop that addresses the 'negative side effects from misspecified reward functions' problem