Starling-7B

Enhancing the usability and safety of LLM

CommonProductchattingLanguage ModelReinforcement Learning
Starling-7B is an open-weights large language model (LLM) trained using Reinforcement Learning from AI Feedback (RLAIF). It was trained effectively leveraging our new GPT-4 labeled ranking dataset, Nectar, and a novel reward training and policy optimization process. Starling-7B achieved a score of 8.09 on MT Bench, with GPT-4 as the judge, surpassing all existing models except OpenAI's GPT-4 and GPT-4 Turbo. We have released the ranked dataset Nectar, the reward model Starling-RM-7B-alpha, the language model Starling-LM-7B-alpha on HuggingFace, and an online demo on LMSYS Chatbot Arena. Stay tuned for the upcoming release of our code and paper, which will provide more details about the entire process.
Visit

Starling-7B Alternatives