Kunlun Wanwei Releases New Large Model Reward Model Skywork-Reward

AIbase基地

Published inAI News · 4 min read · Sep 13, 2024

258

Kunlun Technology Co., Ltd. recently announced that its two new reward models, Skywork-Reward-Gemma-2-27B and Skywork-Reward-Llama-3.1-8B, have demonstrated exceptional performance on the internationally authoritative reward model evaluation benchmark, RewardBench. Notably, the Skywork-Reward-Gemma-2-27B model has topped the rankings, receiving high acclaim from the official RewardBench.

Reward models play a central role in reinforcement learning, evaluating the performance of agents under different states and providing reward signals to guide the learning process, enabling agents to make optimal choices in specific environments. In the training of large language models, the role of reward models is particularly critical, helping the models to more accurately understand and generate content that aligns with human preferences.

WeChat Screenshot_20240913083436.png

RewardBench is a benchmark test specifically designed to evaluate the effectiveness of reward models in large language models, conducting comprehensive assessments of models through multiple tasks, including dialogue, reasoning, and safety. The test dataset for this benchmark consists of triplets made up of prompts, selected responses, and rejected responses, to test whether the reward model can correctly rank the selected response ahead of the rejected response given a prompt.

Kunlun Technology's Skywork-Reward models were developed using carefully selected partial order datasets and relatively small base models. Compared to existing reward models, their partial order data is sourced only from publicly available internet data and is obtained through specific filtering strategies to achieve high-quality preference datasets. These datasets cover a wide range of topics, including safety, mathematics, and coding, and have been manually verified to ensure the objectivity of the data and the significance of the reward gaps.

Upon testing, Kunlun Technology's reward models have shown outstanding performance in areas such as dialogue and safety, especially when faced with challenging samples, only the Skywork-Reward-Gemma-2-27B model provided correct predictions. This achievement marks Kunlun Technology's technological prowess and innovative capabilities in the global AI field, and also opens up new possibilities for the development and application of AI technology.

27B Model Address:

https://huggingface.co/Skywork/Skywork-Reward-Gemma-2-27B

8B Model Address:

https://huggingface.co/Skywork/Skywork-Reward-Llama-3.1-8B

Moonshot AI Releases and Opensources Kimi K2 Model, Strong in Code and Agentic Tasks

Moonshot AI officially released its latest creation - the Kimi K2 model, and simultaneously announced its open source. This foundation model based on the MoE architecture has gained widespread attention in the AI field since its release, thanks to its strong coding capabilities and excellent general Agent task processing abilities. The Kimi K2 model has a total of 1T parameters, with 32B activated parameters. It has achieved top performance among open-source models in a series of benchmark performance tests such as SWE Bench Verified, Tau2, and AceBench.

Tencent Hunyuan-A13B Model API Launches

Recently, Tencent Cloud officially launched the API service for the Tencent Hunyuan A13B model on its official website. The input price is set at 0.5 yuan per million Tokens, and the output price is 2 yuan per million Tokens, which has quickly sparked enthusiastic discussions in the developer community. As the first 13B-level MoE (Mixture of Experts) open-source hybrid inference model in the industry, Hunyuan-A13B features a total of 80B parameters and only 13B activated parameters, achieving performance comparable to leading open-source models of the same architecture, while also demonstrating efficient reasoning capabilities.

AI Daily: Zhipu Launches PPT Generation Function AI Slides; Ke Ling AI Releases Ketur 2.1 Model

1. Zhipu launches free AI Slides for PPT generation. 2. Keling AI introduces KeTu 2.1 with 180 styles. 3. NVIDIA's DiffusionRenderer enables 3D scene editing. 4. Modao AI offers 30-second prototype generation. 5. Higgsfield creates avatars from 10 photos. 6. Google open-sources GenAI Processors. 7. Google Veo3 adds image-to-video. 8. Mistral AI releases Devstral2507 for code generation.....

Product Finder

Product Submit

AI Models Finder

MCP Servers

MCP Client

MCP Inspector

Case Tutorials

Latest AI News

AI Daily Brief

Kunlun Wanwei Releases New Large Model Reward Model Skywork-Reward

AIbase基地

This article is from AIbase Daily

AI News Recommendations

Moonshot AI Releases and Opensources Kimi K2 Model, Strong in Code and Agentic Tasks

Tencent Hunyuan-A13B Model API Launches

AI Daily: Zhipu Launches PPT Generation Function AI Slides; Ke Ling AI Releases Ketur 2.1 Model

Microsoft BioEmu Model Dramatically Shortens Protein Simulation Time

Llama Is Abandoned! Meta Shifts to Claude, Insider Secrets Revealed

City Commercial Banks Are Launching a Trend of Large Model Bidding, with Million-Level Investments Becoming a New Industry Opportunity!

Kling AI Releases KTu 2.1 Model: Significant Improvement in Image Generation Capabilities, Supports 180 Styles

Keling AI Launches Keltu 2.1 Model, Will Be Free for All Members for 7 Days

vivo New Multimodal Model Launches! AI's Ability to Understand GUI Interfaces is Upgraded Again!

Meta Hires Apple AI Model Head for Over 200 Million USD