24-Game-Reasoning

Public

超简单复现Deepseek-R1-Zero和Deepseek-R1，以「24点游戏」为例。通过zero-RL、SFT以及SFT+RL，以激发LLM的自主验证反思能力。

24game alignment cot deepseek llm long-cot o1 post-training r1 r1-zero

Creat：2025-02-26T15:46:13

Update：2025-03-24T20:54:55

Stars

Stars Increase

Related projects

Pandas

alignment

Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more

45229

4周前

+16today

Awesome LLM Reasoning

awesome

Reasoning in LLMs: Papers and Resources, including Chain-of-Thought, OpenAI o1, and DeepSeek-R1 ?

3002

4周前

+4today

3DDFA_V2

The official PyTorch implementation of Towards Fast, Accurate and Stable 3D Dense Face Alignment, ECCV 2020.

2978

1个月前

Alpaca CoT

alpaca

We unified the interfaces of instruction-tuning data (e.g., CoT data), multiple LLMs and parameter-efficient methods (e.g., lora, p-tuning) together for easy use. We welcome open-source enthusiasts to initiate any meaningful PR on this repo and integrate as many LLM related technologies as possible. 我们打造了方便研究人员上手和使用大模型等微调平台，我们欢迎开源爱好者发起任何有意义的pr！

2737

4周前

+3today

Aeneas

alignment

aeneas is a Python/C library and a set of tools to automagically synchronize audio and text (aka forced alignment)

2634

4周前

+1today

Autoflow

chatbot

pingcap/autoflow is a Graph RAG based and conversational knowledge base tool built with TiDB Serverless Vector Storage. Demo: https://tidb.ai

2526

4周前

+5today

MOSS RLHF

ai-safety

Secrets of RLHF in Large Language Models Part I: PPO

1357

1个月前

Pywinassistant

artificial-general-intelligence

The first open-source Artificial Narrow Intelligence generalist agentic framework Computer-Using-Agent that fully operates graphical-user-interfaces (GUIs) by using only natural language. Uses Visualization-of-Thought and Chain-of-Thought reasoning to elicit spatial reasoning and perception, emulates, plans and simulates synthetic HID interactions.

1296

1个月前

Gangealing

alignment

Official PyTorch Implementation of "GAN-Supervised Dense Visual Alignment" (CVPR 2022 Oral, Best Paper Finalist)

1013

4周前

Awesome Knowledge Distillation Of LLMs

alignment

This repository collects papers for "A Survey on Knowledge Distillation of Large Language Models". We break down KD into Knowledge Elicitation and Distillation Algorithms, and explore the Skill & Vertical Distillation of LLMs.

996

4周前

+1today

AI News

AI Daily

AI Timeline

Al Hardware

Latest Cases

Image Collection

Video Collection

Audio Collection

Content Collection

Latest Tutorials

AI Product Ranking

AI Traffic Growth Ranking

AI Traffic Decline Ranking

AI Weekly Ranking

United States

China

India

Brazil

Image Generation

Personal Assistant

Character Generation

Video Generation

AI Project Ranking

AI Project Growth Ranking

AI Developer Ranking

AI Organization Ranking

Deepseek

TTS

LLM

ChatGPT

Overview

24-Game-Reasoning

Related projects

Pandas

Awesome LLM Reasoning

3DDFA_V2

Alpaca CoT

Aeneas

Autoflow

MOSS RLHF

Pywinassistant

Gangealing

Awesome Knowledge Distillation Of LLMs