EvolveMem:Self-Evolving Memory Architecture via AutoResearch for LLM Agents Paper • 2605.13941 • Published 4 days ago • 21
Covering Human Action Space for Computer Use: Data Synthesis and Benchmark Paper • 2605.12501 • Published 5 days ago • 13
Auto-Rubric as Reward: From Implicit Preferences to Explicit Multimodal Generative Criteria Paper • 2605.08354 • Published 9 days ago • 22
BandPO: Bridging Trust Regions and Ratio Clipping via Probability-Aware Bounds for LLM Reinforcement Learning Paper • 2603.04918 • Published Mar 5 • 56
GEditBench v2: A Human-Aligned Benchmark for General Image Editing Paper • 2603.28547 • Published Mar 30 • 32
GEditBench v2: A Human-Aligned Benchmark for General Image Editing Paper • 2603.28547 • Published Mar 30 • 32
Envision: Benchmarking Unified Understanding & Generation for Causal World Process Insights Paper • 2512.01816 • Published Dec 1, 2025 • 94