AIbase
Product LibraryTool Navigation

DPO-ST

Public

[ACL 2024] Self-Training with Direct Preference Optimization Improves Chain-of-Thought Reasoning

Creat2024-06-04T23:37:20
Update2025-02-28T09:50:26
https://arxiv.org/abs/2407.18248
41
Stars
0
Stars Increase