开源 TTRL 论文解读:用多数投票当奖励信号在无标签测试数据上跑 GRPO,Qwen2.5-Math-7B AIME 2024 pass@1 提升 211%
论文:TTRL: Test-Time Reinforcement Learning(完全开源,NeurIPS 2025,1087 stars) 会议:NeurIPS 2025 链接:https://arxiv.org/abs/2504.16084 GitHub:PRIME-RL/TTRL | 1087 stars …
2026/6/22 20:15:51