12 16

ControlNet

ControlNet

AI & ML interests

Visual Reasoning, Neuro-Symbolic, LLM/VLM

Recent Activity

authored a paper 2 days ago

VIEW2SPACE: Studying Multi-View Visual Reasoning from Sparse Observations

authored a paper 2 days ago

Mini-BEHAVIOR-Gran: Revealing U-Shaped Effects of Instruction Granularity on Language-Guided Embodied Agents

authored a paper 2 days ago

MATA: A Trainable Hierarchical Automaton System for Multi-Agent Visual Reasoning

View all activity

Organizations

None yet

authored 5 papers 2 days ago

VIEW2SPACE: Studying Multi-View Visual Reasoning from Sparse Observations

Paper • 2603.16506 • Published Mar 18

Mini-BEHAVIOR-Gran: Revealing U-Shaped Effects of Instruction Granularity on Language-Guided Embodied Agents

Paper • 2604.17019 • Published 13 days ago

MATA: A Trainable Hierarchical Automaton System for Multi-Agent Visual Reasoning

Paper • 2601.19204 • Published Jan 27

DexAvatar: 3D Sign Language Reconstruction with Hand and Body Pose Priors

Paper • 2512.21054 • Published Dec 24, 2025

Do Blind Spots Matter for Word-Referent Mapping? A Computational Study with Infant Egocentric Video

Paper • 2511.11725 • Published Nov 13, 2025

authored 2 papers 8 months ago

Explain Before You Answer: A Survey on Compositional Visual Reasoning

Paper • 2508.17298 • Published Aug 24, 2025 • 4

JRDB-Reasoning: A Difficulty-Graded Benchmark for Visual Reasoning in Robotics

Paper • 2508.10287 • Published Aug 14, 2025

authored 2 papers 9 months ago

AV-Deepfake1M++: A Large-Scale Audio-Visual Deepfake Benchmark with Real-World Perturbations

Paper • 2507.20579 • Published Jul 28, 2025

M-MRE: Extending the Mutual Reinforcement Effect to Multimodal Information Extraction

Paper • 2504.17353 • Published Apr 24, 2025

authored 2 papers about 1 year ago

DWIM: Towards Tool-aware Visual Reasoning via Discrepancy-aware Workflow Generation & Instruct-Masking Tuning

Paper • 2503.19263 • Published Mar 25, 2025 • 2

NAVER: A Neuro-Symbolic Compositional Automaton for Visual Grounding with Explicit Logic Reasoning

Paper • 2502.00372 • Published Feb 1, 2025 • 1

authored 9 papers over 1 year ago

JRDB-Social: A Multifaceted Robotic Dataset for Understanding of Context and Dynamics of Human Interactions Within Social Groups

Paper • 2404.04458 • Published Apr 6, 2024 • 1

Hi-SLAM: Scaling-up Semantics in SLAM with a Hierarchically Categorical Gaussian Splatting

Paper • 2409.12518 • Published Sep 19, 2024 • 1

NEUSIS: A Compositional Neuro-Symbolic Framework for Autonomous Perception, Reasoning, and Planning in Complex UAV Search Missions

Paper • 2409.10196 • Published Sep 16, 2024 • 1

HYDRA: A Hyper Agent for Dynamic Compositional Visual Reasoning

Paper • 2403.12884 • Published Mar 19, 2024 • 1

AV-Deepfake1M: A Large-Scale LLM-Driven Audio-Visual Deepfake Dataset

Paper • 2311.15308 • Published Nov 26, 2023 • 2

"Glitch in the Matrix!": A Large Scale Benchmark for Content Driven Audio-Visual Forgery Detection and Localization

Paper • 2305.01979 • Published May 3, 2023 • 2

Emolysis: A Multimodal Open-Source Group Emotion Analysis and Visualization Toolkit

Paper • 2305.05255 • Published May 9, 2023 • 1

MARLIN: Masked Autoencoder for facial video Representation LearnINg

Paper • 2211.06627 • Published Nov 12, 2022 • 1

1M-Deepfakes Detection Challenge

Paper • 2409.06991 • Published Sep 11, 2024 • 1

ControlNet

AI & ML interests

Recent Activity

Organizations

ControlNet's activity