VIEW2SPACE: Studying Multi-View Visual Reasoning from Sparse Observations Paper • 2603.16506 • Published Mar 18
Mini-BEHAVIOR-Gran: Revealing U-Shaped Effects of Instruction Granularity on Language-Guided Embodied Agents Paper • 2604.17019 • Published 13 days ago
MATA: A Trainable Hierarchical Automaton System for Multi-Agent Visual Reasoning Paper • 2601.19204 • Published Jan 27
DexAvatar: 3D Sign Language Reconstruction with Hand and Body Pose Priors Paper • 2512.21054 • Published Dec 24, 2025
Do Blind Spots Matter for Word-Referent Mapping? A Computational Study with Infant Egocentric Video Paper • 2511.11725 • Published Nov 13, 2025
Explain Before You Answer: A Survey on Compositional Visual Reasoning Paper • 2508.17298 • Published Aug 24, 2025 • 4
JRDB-Reasoning: A Difficulty-Graded Benchmark for Visual Reasoning in Robotics Paper • 2508.10287 • Published Aug 14, 2025
AV-Deepfake1M++: A Large-Scale Audio-Visual Deepfake Benchmark with Real-World Perturbations Paper • 2507.20579 • Published Jul 28, 2025
M-MRE: Extending the Mutual Reinforcement Effect to Multimodal Information Extraction Paper • 2504.17353 • Published Apr 24, 2025
DWIM: Towards Tool-aware Visual Reasoning via Discrepancy-aware Workflow Generation & Instruct-Masking Tuning Paper • 2503.19263 • Published Mar 25, 2025 • 2
NAVER: A Neuro-Symbolic Compositional Automaton for Visual Grounding with Explicit Logic Reasoning Paper • 2502.00372 • Published Feb 1, 2025 • 1
JRDB-Social: A Multifaceted Robotic Dataset for Understanding of Context and Dynamics of Human Interactions Within Social Groups Paper • 2404.04458 • Published Apr 6, 2024 • 1
Hi-SLAM: Scaling-up Semantics in SLAM with a Hierarchically Categorical Gaussian Splatting Paper • 2409.12518 • Published Sep 19, 2024 • 1
NEUSIS: A Compositional Neuro-Symbolic Framework for Autonomous Perception, Reasoning, and Planning in Complex UAV Search Missions Paper • 2409.10196 • Published Sep 16, 2024 • 1
HYDRA: A Hyper Agent for Dynamic Compositional Visual Reasoning Paper • 2403.12884 • Published Mar 19, 2024 • 1
AV-Deepfake1M: A Large-Scale LLM-Driven Audio-Visual Deepfake Dataset Paper • 2311.15308 • Published Nov 26, 2023 • 2
"Glitch in the Matrix!": A Large Scale Benchmark for Content Driven Audio-Visual Forgery Detection and Localization Paper • 2305.01979 • Published May 3, 2023 • 2
Emolysis: A Multimodal Open-Source Group Emotion Analysis and Visualization Toolkit Paper • 2305.05255 • Published May 9, 2023 • 1
MARLIN: Masked Autoencoder for facial video Representation LearnINg Paper • 2211.06627 • Published Nov 12, 2022 • 1