ICLR 2021

On the Bottleneck of Graph Neural Networks and its Practical Implications

Uri Alon, Eran Yahav

TL;DR — GNNs suffer from "over-squashing" — as messages travel through many layers, information from distant nodes gets exponentially compressed into fixed-size vectors. This creates an information bottleneck, especially in graphs with narrow passages.

The Problem

Graph Neural Networks (GNNs) propagate information through message passing: at each layer, every node aggregates messages from its neighbors. When a node needs to make a prediction that depends on information k hops away, the GNN requires at least k layers of message passing.

But here is the catch. After k layers, a node's receptive field grows exponentially — it receives messages from up to O(bk) nodes, where b is the branching factor. All of this information must be compressed into a single fixed-size vector. This is the information bottleneck of GNNs, and we call the resulting phenomenon over-squashing.

Over-squashing means that the influence of a distant node on the final representation decays exponentially with distance. A node 6 hops away has negligible effect on the target node's representation, even though that distant information may be critical for the task.

The Key Idea

Over-squashing is not just a depth problem — it is fundamentally tied to the graph topology. Graphs with narrow passages, such as a tree with a single root connecting two large subtrees, force all information through a small set of bottleneck nodes.

We formalize this connection by analyzing the Jacobian of a node's representation with respect to distant input features. We show that the sensitivity decays exponentially with distance, and that the decay rate depends on the graph topology — specifically, on the width of the narrowest cut separating the two nodes.

Bottleneck nodes — those with high betweenness centrality — sit on many shortest paths and must carry information for a disproportionate number of node pairs. When the graph has such bottleneck structure, no amount of depth or width can fully compensate.

The practical implication: we can add edges to the graph based on its topology to relieve over-squashing, without requiring any additional learnable parameters. By creating alternative paths for information flow, we directly improve the Jacobian bound.

Interactive Demo: Over-Squashing in Action

This simulation shows how a signal propagates through a graph via message passing. A source node (red) emits a signal of strength 1.0. At each layer, each node averages the signals from its neighbors. Watch how the signal dilutes as it passes through the bottleneck.

Message-Passing Simulation

Signal Strength at Target

1.000

Layer

0 / 6

Signal per Node

Click Next Layer to propagate the signal through one GNN layer.
Source node
Strong signal
Weak signal
Bottleneck node

The bottleneck node (dashed red ring) must carry all information between the left and right subtrees. Notice how the signal at the target node (rightmost) is far stronger with shortcut edges that bypass the bottleneck.

How It Works: Jacobian Analysis and Graph Topology

We analyze over-squashing through the Jacobian ||dhv(k) / dxu||, which measures how sensitive node v's representation at layer k is to the input features of node u.

We prove that this sensitivity decays exponentially with the distance between u and v, and that the rate of decay depends on the graph topology. Specifically, when the shortest path between u and v passes through a bottleneck — a node with high betweenness centrality — the gradient is exponentially suppressed.

Why adding edges helps

By adding edges that create alternative paths for information flow, we directly improve the Jacobian bound: more paths mean less squashing. Critically, these are graph-topology modifications — no new parameters are learned. The GNN architecture stays the same; only the input graph changes.

Key insight: Over-squashing is a property of the graph topology, not the GNN architecture. Deeper or wider models cannot overcome a topological bottleneck. The remedy is to modify the graph itself.

Results

We evaluate on tasks that explicitly require long-range information propagation. Adding edges based on graph topology (using spectral gap analysis and multi-scale methods) significantly improves GNN performance:

Adding edges based on graph topology consistently improves performance on long-range tasks. On the Tree-NeighborsMatch problem, GNNs with graph rewiring solve instances that are impossible for standard message-passing architectures, demonstrating that the bottleneck is topological rather than architectural.

@inproceedings{alon2021bottleneck, title={On the Bottleneck of Graph Neural Networks and its Practical Implications}, author={Uri Alon and Eran Yahav}, booktitle={International Conference on Learning Representations (ICLR)}, year={2021}, url={https://arxiv.org/abs/2006.05205} }