Skywork-Reward-Llama-3.1-8B is an advanced reward model based on the Meta-Llama-3.1-8B-Instruct architecture. It has been trained using the Skywork Reward Data Collection, which consists of 80,000 high-quality preference pairs. The model excels in handling preferences in complex scenarios, particularly with challenging preference pairs spanning multiple domains, including mathematics, programming, and security. As of September 2024, the model ranks third on the RewardBench leaderboard.