Llama3-70B-SteerLM-RM

A 70-billion parameter multi-faceted reward model

CommonProductProgrammingLanguage ModelReward Model
Llama3-70B-SteerLM-RM is a 70-billion parameter language model that serves as a property prediction model, specifically a multi-faceted reward model. It evaluates model responses across multiple dimensions instead of relying on a single score, unlike traditional reward models. This model was trained on the HelpSteer2 dataset and utilizes NVIDIA NeMo-Aligner, an scalable toolkit for efficient and effective model alignment.
Visit

Llama3-70B-SteerLM-RM Visit Over Time

Monthly Visits

18200568

Bounce Rate

44.11%

Page per Visit

5.8

Visit Duration

00:05:46

Llama3-70B-SteerLM-RM Visit Trend

Llama3-70B-SteerLM-RM Visit Geography

Llama3-70B-SteerLM-RM Traffic Sources

Llama3-70B-SteerLM-RM Alternatives