rlhf-nlp

Public

POC library built on TextRL for easy training and usage of fine-tuned models using RLHF, a rewards model, and PPO

ppo reward-model rlhf textrl

Creat：2024-02-28T18:54:23

Update：2025-02-06T02:13:05

Stars

Stars Increase

Related projects

LLaMA Factory

Hot

agent

Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)

46364

1周前

+96today

Open Assistant

OpenAssistant is a chat-based assistant that understands tasks, can interact with third-party systems, and retrieve information dynamically to do so.

37284

1周前

-2today

LLMSurvey

chain-of-thought

The official GitHub page for the survey paper "A Survey of Large Language Models".

11336

2周前

+5today

Easy Rl

a3c

强化学习中文教程（蘑菇书?），在线阅读地址：https://datawhalechina.github.io/easy-rl/

10883

1周前

+12today

Tianshou

a2c

An elegant PyTorch deep reinforcement learning library.

8366

1周前

+4today

Chinese LLaMA Alpaca 2

64k

中文LLaMA-2 & Alpaca-2大模型二期项目 + 64K超长上下文模型 (Chinese LLaMA-2 & Alpaca-2 LLMs with 64K long context models)

7158

2周前

+1today

InternLM

chatbot

Official release of InternLM series (InternLM, InternLM2, InternLM2.5, InternLM3).

6851

1周前

+2today

Alignment Handbook

llm

Robust recipes to align language models with human and AI preferences

5115

1周前

+4today

Reinforcement Learning

a2c

Learn Deep Reinforcement Learning in 60 days! Lectures & Code in Python. Reinforcement Learning + Deep Learning

4307

3年前

+2today

Deep Reinforcement Learning With Pytorch

a2c

PyTorch implementation of DQN, AC, ACER, A2C, A3C, PG, DDPG, TRPO, PPO, SAC, TD3 and ....

4223

2周前

+1today

AI News

AI Daily

AI Timeline

Latest Cases

Image Collection

Video Collection

Audio Collection

Content Collection

Latest Tutorials

AI Product Ranking

AI Traffic Growth Ranking

AI Traffic Decline Ranking

AI Weekly Ranking

United States

China

India

Brazil

Image Generation

Personal Assistant

Character Generation

Video Generation

AI Project Ranking

AI Project Growth Ranking

AI Developer Ranking

AI Organization Ranking

Deepseek

TTS

LLM

ChatGPT

Overview