OPD学习笔记

📅 2026/7/5 5:12:15
OPD学习笔记
学习OPD并复现。参考资料https://github.com/david-xinyuwei/david-share/blob/master/DL-Algorithm-Insights/Multi-Expert-OPD-Distillation/README-CN.mdhttps://github.com/david-xinyuwei/david-share/tree/master/DL-Algorithm-Insights。一些启发作者讨论的“为什么是on-policy 而不是 sft”见https://github.com/david-xinyuwei/david-share/blob/master/DL-Algorithm-Insights/Multi-Expert-OPD-Distillation/README-CN.md “vs SFTSupervised Fine-Tuning—— Exposure Bias 问题”