Llama3-70B-SteerLM-RM
A 70-billion parameter multi-faceted reward model
CommonProductProgrammingLanguage ModelReward Model
Llama3-70B-SteerLM-RM is a 70-billion parameter language model that serves as a property prediction model, specifically a multi-faceted reward model. It evaluates model responses across multiple dimensions instead of relying on a single score, unlike traditional reward models. This model was trained on the HelpSteer2 dataset and utilizes NVIDIA NeMo-Aligner, an scalable toolkit for efficient and effective model alignment.
Llama3-70B-SteerLM-RM Visit Over Time
Monthly Visits
17788201
Bounce Rate
44.87%
Page per Visit
5.4
Visit Duration
00:05:32