Nemotron-4-340B-Reward
A multi-dimensional reward model to help build custom large language models.
CommonProductProgrammingLarge Language ModelSynthetic Data Generation
Nemotron-4-340B-Reward, developed by NVIDIA, is a multi-dimensional reward model used in synthetic data generation pipelines to assist researchers and developers in building their own LLMs. Composed of the Nemotron-4-340B-Base model and a linear layer, it converts response end-of-sequence markers into five scalar values corresponding to HelpSteer2 attributes. It supports a maximum context length of 4096 tokens and can score five attributes for each assistant turn.
Nemotron-4-340B-Reward Visit Over Time
Monthly Visits
19075321
Bounce Rate
45.07%
Page per Visit
5.5
Visit Duration
00:05:32