OPID:在线策略技能蒸馏,让智能体学习无需外部记忆
OPID: On-Policy Skill Distillation for Agentic Reinforcement Learning 作者:Shuo Yang, Jinyang Wu, Zhengxi Lu, Yuhao Shen, Fan Zhang, Lang Feng, Shuai Zhang, Haoran Luo, Zheng Lian, Zhengqi Wen, Jianhua Tao 核心发表机构:论文源码未明确标…
2026/7/1 5:41:27