arxiv:2605.09169

Prediction Bottlenecks Don't Discover Causal Structure (But Here's What They Actually Do)

Published on May 9

· Submitted by

Authors:

Abstract

A Mamba state-space model's claimed recovery of Granger-causal structure through a simple readout was tested across synthetic and real datasets with interventions, revealing the method-level claim does not hold when accounting for confounding factors and baseline approaches.

AI-generated summary

A Mamba state-space model trained only for next-step prediction appears to recover Granger-causal structure through a simple readout S = |W_{out} W_{in}|, with early experiments suggesting the phenomenon generalized across architectures and benefited from interventional data at p < 10^{-5}. We package the protocol used to test that claim -- standardized synthetic generators (VAR/Lorenz/CauseMe-style), three intervention semantics (do(X=c), soft-noise, random-forcing), edge-provenance cards on three real datasets, and size-matched control arms -- as a reusable falsification benchmark, and walk the claim through it in five stages. The method-level claim does not survive: (i) a plain linear bottleneck does as well or better; (ii) tuned Lasso beats the bottleneck on synthetic CauseMe-style benchmarks, and on Lorenz-96 (the only real benchmark with unambiguous ground truth) classical PCMCI and Granger lead a tight cluster in which the bottleneck trails; (iii) the headline intervention advantage is roughly 60% a sample-size confound, and the residual disappears under standard do(X=c) interventions, surviving only under a non-standard random-forcing scheme; (iv) even that residual reproduces, with a larger effect, in classical bivariate Granger -- the effect is method-agnostic. What survives is a narrow characterization result; the benchmark is the lasting artifact, and each stage above is one of its control arms.

View arXiv page View PDF Add to collection

Community

amanchadha

Paper submitter about 4 hours ago

This paper falsifies the claim that next-step prediction bottlenecks—especially Mamba/SSM weight projections—recover causal structure, showing instead that their apparent gains are mostly low-rank regression, sample-size confounds, intervention-semantics artifacts, and target-corruption robustness, with the main durable contribution being a reusable falsification benchmark.

➡️ 𝐊𝐞𝐲 𝐇𝐢𝐠𝐡𝐥𝐢𝐠𝐡𝐭𝐬 𝐨𝐟 𝐭𝐡𝐞𝐢𝐫 𝐏𝐫𝐞𝐝𝐢𝐜𝐭𝐢𝐨𝐧-𝐚𝐬-𝐂𝐚𝐮𝐬𝐚𝐥-𝐃𝐢𝐬𝐜𝐨𝐯𝐞𝐫𝐲 𝐅𝐚𝐥𝐬𝐢𝐟𝐢𝐜𝐚𝐭𝐢𝐨𝐧 𝐅𝐫𝐚𝐦𝐞𝐰𝐨𝐫𝐤: 

🧪 𝑹𝒆𝒖𝒔𝒂𝒃𝒍𝒆 𝑭𝒊𝒗𝒆-𝑺𝒕𝒂𝒈𝒆 𝑭𝒂𝒍𝒔𝒊𝒇𝒊𝒄𝒂𝒕𝒊𝒐𝒏 𝑩𝒆𝒏𝒄𝒉𝒎𝒂𝒓𝒌: Introduces a control-heavy benchmark spanning VAR, Lorenz-96, CauseMe-style generators, real datasets with edge-provenance cards, matched-capacity architectures, size-matched observational controls, and multiple intervention semantics to stress-test claims that prediction models implicitly recover causal graphs.

🧩 𝑾𝒆𝒊𝒈𝒉𝒕-𝑷𝒓𝒐𝒋𝒆𝒄𝒕𝒊𝒐𝒏 𝑪𝒂𝒖𝒔𝒂𝒍𝒊𝒕𝒚 𝑫𝒐𝒆𝒔 𝑵𝒐𝒕 𝑺𝒖𝒓𝒗𝒊𝒗𝒆 𝑪𝒐𝒏𝒕𝒓𝒐𝒍𝒔: Tests the extraction rule (S = |W_{out}W_{in}|) for bottleneck predictors and shows that linear bottlenecks match or beat Mamba SSMs, tuned Lasso dominates on synthetic graph recovery, and classical PCMCI/Granger-style methods outperform the bottleneck on clean Lorenz-96 ground truth.

🧠 𝑰𝒏𝒕𝒆𝒓𝒗𝒆𝒏𝒕𝒊𝒐𝒏 𝑮𝒂𝒊𝒏𝒔 𝑨𝒓𝒆 𝑪𝒐𝒏𝒇𝒐𝒖𝒏𝒅𝒔, 𝑵𝒐𝒕 𝑪𝒂𝒖𝒔𝒂𝒍 𝑬𝒙𝒕𝒓𝒂𝒄𝒕𝒊𝒐𝒏: Demonstrates that the reported interventional advantage mostly comes from extra sample size and a non-standard per-step random-forcing intervention; under proper (do(X_i=c)) interventions the effect nearly vanishes, while the residual appears even more strongly in classical bivariate Granger, indicating method-agnostic target-corruption robustness rather than learned causal discovery.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Get this paper in your agent:

hf papers read 2605.09169

Don't have the latest CLI?

curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2605.09169 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2605.09169 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2605.09169 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.