LLMs Improving LLMs: Agentic Discovery for Test-Time Scaling Paper • 2605.08083 • Published 11 days ago • 64
Nonsense Helps: Prompt Space Perturbation Broadens Reasoning Exploration Paper • 2605.05566 • Published 12 days ago • 37
EPO: Entropy-regularized Policy Optimization for LLM Agents Reinforcement Learning Paper • 2509.22576 • Published Sep 26, 2025 • 137
Self-Rewarding Vision-Language Model via Reasoning Decomposition Paper • 2508.19652 • Published Aug 27, 2025 • 84