Rebellious Student: Reversing Teacher Signals for Reasoning Exploration with Self-Distilled RLVR Paper • 2605.10781 • Published 1 day ago • 11
Why Does Self-Distillation (Sometimes) Degrade the Reasoning Capability of LLMs? Paper • 2603.24472 • Published Mar 25 • 55
Exploratory Memory-Augmented LLM Agent via Hybrid On- and Off-Policy Optimization Paper • 2602.23008 • Published Feb 26 • 37