Bottlenecks, over-squashing, and attention in Graph Neural Networks.
- Uri Alon (CMU)
Abstract
Since the proposal of the graph neural network (GNN) by Gori et al. (2005) and Scarselli et al. (2008), one of the major problems in training GNNs was their struggle to propagate information between distant nodes in the graph. We propose a new explanation for this problem: GNNs are susceptible to a bottleneck when aggregating messages across a long path. This bottleneck causes the over-squashing of exponentially growing information into fixed-size vectors. As a result, GNNs fail to propagate messages originating from distant nodes and perform poorly when the prediction task depends on long-range interaction.
Initially, we hypothesized that thanks to their attention mechanism, Graph Attention Networks (GAT) are more resistant to over-squashing than GNNs that absorb incoming edges equally, such as GCN and GIN. However, we found that GAT computes a very limited kind of attention: the ranking of the attention scores is unconditioned on the query node. We formally define this restricted kind of attention as static attention and distinguish it from a strictly more expressive dynamic attention. Because GATs use a static attention mechanism, there are simple graph problems that GAT cannot express.
To remove this limitation, we introduce a simple fix by modifying the order of operations and propose GATv2, which is now available as part of the PyTorch Geometric library, the Deep Graph Library, and the TensorFlow GNN library. The talk will focus on the following papers:
On the Bottleneck of Graph Neural Networks and its Practical Implications (ICLR'2021): arxiv.org/pdf/2006.05205
How Attentive are Graph Attention Networks? (ICLR'2022): arxiv.org/pdf/2105.14491