AIbase
Product LibraryTool Navigation

Nano-R1

Public

This project demonstrates the process of fine-tuning the Qwen2.5-3B-Instruct model using GRPO (Generalized Reward Policy Optimization) on the GSM8K dataset.

Creat2025-04-04T14:00:58
Update2025-04-04T15:28:25
https://huggingface.co/Akshint47/Nano_R1_Model
1
Stars
0
Stars Increase