ATLAS: Agentic or Latent Visual Reasoning? One Word is Enough for Both Paper • 2605.15198 • Published 2 days ago • 16
NeoVerse: Enhancing 4D World Model with in-the-wild Monocular Videos Paper • 2601.00393 • Published Jan 1 • 133
Thinking-while-Generating: Interleaving Textual Reasoning throughout Visual Generation Paper • 2511.16671 • Published Nov 20, 2025 • 16
Are Video Models Ready as Zero-Shot Reasoners? An Empirical Study with the MME-CoF Benchmark Paper • 2510.26802 • Published Oct 30, 2025 • 34
Are Video Models Ready as Zero-Shot Reasoners? An Empirical Study with the MME-CoF Benchmark Paper • 2510.26802 • Published Oct 30, 2025 • 34
VisionThink: Smart and Efficient Vision Language Model via Reinforcement Learning Paper • 2507.13348 • Published Jul 17, 2025 • 79
Iterative Prompt Relabeling for diffusion model with RLDF Paper • 2312.16204 • Published Dec 23, 2023
EmpathyAgent: Can Embodied Agents Conduct Empathetic Actions? Paper • 2503.16545 • Published Mar 19, 2025 • 1
MINT-CoT: Enabling Interleaved Visual Tokens in Mathematical Chain-of-Thought Reasoning Paper • 2506.05331 • Published Jun 5, 2025 • 13
MINT-CoT: Enabling Interleaved Visual Tokens in Mathematical Chain-of-Thought Reasoning Paper • 2506.05331 • Published Jun 5, 2025 • 13
MINT-CoT: Enabling Interleaved Visual Tokens in Mathematical Chain-of-Thought Reasoning Paper • 2506.05331 • Published Jun 5, 2025 • 13 • 1
T2I-R1: Reinforcing Image Generation with Collaborative Semantic-level and Token-level CoT Paper • 2505.00703 • Published May 1, 2025 • 44