AIbase
Product LibraryTool Navigation

filtered-dpo

Public

Introducing Filtered Direct Preference Optimization (fDPO) that enhances language model alignment with human preferences by discarding lower-quality samples compared to those generated by the learning model

Creat2024-04-15T14:03:47
Update2025-01-22T12:57:52
https://arxiv.org/abs/2404.13846
11
Stars
0
Stars Increase