Kunlun Technology Co., Ltd. recently announced that its two new reward models, Skywork-Reward-Gemma-2-27B and Skywork-Reward-Llama-3.1-8B, have demonstrated exceptional performance on the internationally authoritative reward model evaluation benchmark, RewardBench. Notably, the Skywork-Reward-Gemma-2-27B model has topped the rankings, receiving high acclaim from the official RewardBench.

Reward models play a central role in reinforcement learning, evaluating the performance of agents under different states and providing reward signals to guide the learning process, enabling agents to make optimal choices in specific environments. In the training of large language models, the role of reward models is particularly critical, helping the models to more accurately understand and generate content that aligns with human preferences.

WeChat Screenshot_20240913083436.png

RewardBench is a benchmark test specifically designed to evaluate the effectiveness of reward models in large language models, conducting comprehensive assessments of models through multiple tasks, including dialogue, reasoning, and safety. The test dataset for this benchmark consists of triplets made up of prompts, selected responses, and rejected responses, to test whether the reward model can correctly rank the selected response ahead of the rejected response given a prompt.

Kunlun Technology's Skywork-Reward models were developed using carefully selected partial order datasets and relatively small base models. Compared to existing reward models, their partial order data is sourced only from publicly available internet data and is obtained through specific filtering strategies to achieve high-quality preference datasets. These datasets cover a wide range of topics, including safety, mathematics, and coding, and have been manually verified to ensure the objectivity of the data and the significance of the reward gaps.

Upon testing, Kunlun Technology's reward models have shown outstanding performance in areas such as dialogue and safety, especially when faced with challenging samples, only the Skywork-Reward-Gemma-2-27B model provided correct predictions. This achievement marks Kunlun Technology's technological prowess and innovative capabilities in the global AI field, and also opens up new possibilities for the development and application of AI technology.

27B Model Address:

https://huggingface.co/Skywork/Skywork-Reward-Gemma-2-27B

8B Model Address:

https://huggingface.co/Skywork/Skywork-Reward-Llama-3.1-8B