Papers
arxiv:2605.09169

Prediction Bottlenecks Don't Discover Causal Structure (But Here's What They Actually Do)

Published on May 9
ยท Submitted by
Aman Chadha
on May 12
Authors:
,
,
,

Abstract

A Mamba state-space model's claimed recovery of Granger-causal structure through a simple readout was tested across synthetic and real datasets with interventions, revealing the method-level claim does not hold when accounting for confounding factors and baseline approaches.

AI-generated summary

A Mamba state-space model trained only for next-step prediction appears to recover Granger-causal structure through a simple readout S = |W_{out} W_{in}|, with early experiments suggesting the phenomenon generalized across architectures and benefited from interventional data at p < 10^{-5}. We package the protocol used to test that claim -- standardized synthetic generators (VAR/Lorenz/CauseMe-style), three intervention semantics (do(X=c), soft-noise, random-forcing), edge-provenance cards on three real datasets, and size-matched control arms -- as a reusable falsification benchmark, and walk the claim through it in five stages. The method-level claim does not survive: (i) a plain linear bottleneck does as well or better; (ii) tuned Lasso beats the bottleneck on synthetic CauseMe-style benchmarks, and on Lorenz-96 (the only real benchmark with unambiguous ground truth) classical PCMCI and Granger lead a tight cluster in which the bottleneck trails; (iii) the headline intervention advantage is roughly 60% a sample-size confound, and the residual disappears under standard do(X=c) interventions, surviving only under a non-standard random-forcing scheme; (iv) even that residual reproduces, with a larger effect, in classical bivariate Granger -- the effect is method-agnostic. What survives is a narrow characterization result; the benchmark is the lasting artifact, and each stage above is one of its control arms.

Community

Paper submitter

This paper falsifies the claim that next-step prediction bottlenecksโ€”especially Mamba/SSM weight projectionsโ€”recover causal structure, showing instead that their apparent gains are mostly low-rank regression, sample-size confounds, intervention-semantics artifacts, and target-corruption robustness, with the main durable contribution being a reusable falsification benchmark.

โžก๏ธ ๐Š๐ž๐ฒ ๐‡๐ข๐ ๐ก๐ฅ๐ข๐ ๐ก๐ญ๐ฌ ๐จ๐Ÿ ๐ญ๐ก๐ž๐ข๐ซ ๐๐ซ๐ž๐๐ข๐œ๐ญ๐ข๐จ๐ง-๐š๐ฌ-๐‚๐š๐ฎ๐ฌ๐š๐ฅ-๐ƒ๐ข๐ฌ๐œ๐จ๐ฏ๐ž๐ซ๐ฒ ๐…๐š๐ฅ๐ฌ๐ข๐Ÿ๐ข๐œ๐š๐ญ๐ข๐จ๐ง ๐…๐ซ๐š๐ฆ๐ž๐ฐ๐จ๐ซ๐ค:โ€จ

๐Ÿงช ๐‘น๐’†๐’–๐’”๐’‚๐’ƒ๐’๐’† ๐‘ญ๐’Š๐’—๐’†-๐‘บ๐’•๐’‚๐’ˆ๐’† ๐‘ญ๐’‚๐’๐’”๐’Š๐’‡๐’Š๐’„๐’‚๐’•๐’Š๐’๐’ ๐‘ฉ๐’†๐’๐’„๐’‰๐’Ž๐’‚๐’“๐’Œ: Introduces a control-heavy benchmark spanning VAR, Lorenz-96, CauseMe-style generators, real datasets with edge-provenance cards, matched-capacity architectures, size-matched observational controls, and multiple intervention semantics to stress-test claims that prediction models implicitly recover causal graphs.

๐Ÿงฉ ๐‘พ๐’†๐’Š๐’ˆ๐’‰๐’•-๐‘ท๐’“๐’๐’‹๐’†๐’„๐’•๐’Š๐’๐’ ๐‘ช๐’‚๐’–๐’”๐’‚๐’๐’Š๐’•๐’š ๐‘ซ๐’๐’†๐’” ๐‘ต๐’๐’• ๐‘บ๐’–๐’“๐’—๐’Š๐’—๐’† ๐‘ช๐’๐’๐’•๐’“๐’๐’๐’”: Tests the extraction rule (S = |W_{out}W_{in}|) for bottleneck predictors and shows that linear bottlenecks match or beat Mamba SSMs, tuned Lasso dominates on synthetic graph recovery, and classical PCMCI/Granger-style methods outperform the bottleneck on clean Lorenz-96 ground truth.

๐Ÿง  ๐‘ฐ๐’๐’•๐’†๐’“๐’—๐’†๐’๐’•๐’Š๐’๐’ ๐‘ฎ๐’‚๐’Š๐’๐’” ๐‘จ๐’“๐’† ๐‘ช๐’๐’๐’‡๐’๐’–๐’๐’…๐’”, ๐‘ต๐’๐’• ๐‘ช๐’‚๐’–๐’”๐’‚๐’ ๐‘ฌ๐’™๐’•๐’“๐’‚๐’„๐’•๐’Š๐’๐’: Demonstrates that the reported interventional advantage mostly comes from extra sample size and a non-standard per-step random-forcing intervention; under proper (do(X_i=c)) interventions the effect nearly vanishes, while the residual appears even more strongly in classical bivariate Granger, indicating method-agnostic target-corruption robustness rather than learned causal discovery.

Sign up or log in to comment

Get this paper in your agent:

hf papers read 2605.09169
Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2605.09169 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2605.09169 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2605.09169 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.