Augment every node with its distance to a set of target nodes. This simple addition is provably more expressive than 1-WL, generalizes inductively, and closes the gap between graph neural networks and the Weisfeiler-Lehman hierarchy.
You're building a drug discovery pipeline. Your molecules are graphs — atoms are nodes, bonds are edges. You want to predict which atoms are part of a specific reactive site. The catch: many atoms are structurally identical within the molecule. Carbon atoms bonded to two hydrogens appear in dozens of positions. Standard GNNs assign them the same embedding. Your model has no way to tell them apart.
This isn't just a molecule problem. In social networks, you want to know if two specific users are likely to connect. In knowledge graphs, you need to know how a query node relates to the entities it might link to. In road networks, you need each node to know how far it is from key hubs. In all these cases, the question isn't "what does this node's neighborhood look like?" but "where is this node relative to something that matters?"
The distance encoding paper asks a precise question: what is the minimal structural information you can add to a GNN to make it provably more expressive? The answer is elegant: distance to a set of target nodes. Give every node a feature vector encoding how far it is from each target node in the set S. This one addition breaks all 1-WL symmetries that prevent graph-level and pair-level structural distinction.
Distance is not arbitrary. Shortest-path distance between nodes is a canonical, interpretable, globally consistent structural signal. It answers "where am I relative to what matters?" — the question standard GNNs cannot ask.
Two graphs that look different globally but have identical local structures at every node. A standard GNN assigns the same embeddings — it cannot tell the graphs apart. Click nodes to see their local neighborhoods.
The core idea is simple enough to state in one sentence: for each node v in the graph, compute its shortest-path distance to every node in a designated set S of "target nodes," and append that distance vector to v's feature vector before running the GNN.
Here, d(v, si) is the shortest-path distance from node v to target node si. The augmented feature xvDE is then fed into any standard GNN — GCN, GAT, GIN, whatever you prefer. No architecture change is needed. Distance encoding is a preprocessing step that makes any existing GNN more powerful.
Distance must be encoded, not used raw. A distance of 5 is meaningfully different from a distance of 6 in a sparse graph; but in a dense graph, both might be "far." Two encodings are proposed:
Click a target node (orange star). Each node's color shows its distance to the target. Hover over nodes to see their distance feature vector.
DE-GNN is the complete pipeline: distance encoding preprocessing followed by a standard GNN. For link prediction between nodes u and v, the target set is S = {u, v}. For node classification, S might be a fixed set of "anchor" nodes. For graph classification, S is the entire node set (or a sampled subset).
For predicting whether nodes u and v will link, DE-GNN extracts a subgraph around the pair and augments each node with its distances to u and v:
Now every node "knows" how close it is to both endpoints. The GNN aggregates this information, and the final embeddings of u and v are combined to score the link. The key insight: even if u and v are structurally symmetric (same local neighborhoods), their distance vectors to themselves are always distinct — d(u, u) = 0 ≠ d(v, u) in general.
Select two nodes (source = orange, target = teal). The color intensity of each node shows its distance feature. Adjust GNN layers to see how information propagates. The link score bar updates in real time.
The paper proves that DE-GNN is strictly more powerful than 1-WL GNNs. The proof proceeds by showing two things: (1) if 1-WL assigns identical colors to two graphs, DE-GNN can still distinguish them; (2) DE-GNN can detect certain graph properties — like cycle lengths — that 1-WL cannot.
Let G and G' be two graphs that are 1-WL indistinguishable (i.e., any standard GNN gives them the same representation). Then there exist graphs G'' such that DE-GNN distinguishes G from G'' even though 1-WL cannot.
More formally, the paper shows that DE-GNN can compute any function that depends on the distance distribution between a target set S and the rest of the graph. This includes:
None of these are computable by standard GNNs. The proof is constructive: for each property, a DE-GNN is explicitly constructed that computes it correctly.
The "strictly less than" (⊊) is key — DE-GNN is not just different, it is provably more expressive in the formal sense.
Distance encoding is parameterized by the choice of target set S. Different tasks call for different strategies:
| Task | Target Set S | Rationale |
|---|---|---|
| Link prediction (u,v) | {u, v} | Encode structural position relative to both endpoints |
| Node classification | All labeled nodes | Each node knows its distance to the training signal sources |
| Graph classification | All nodes (or sample) | Global structural fingerprint |
| Community detection | Community seeds | Soft community membership via distance to seeds |
How many targets are needed? Empirically, |S| = 10 to 50 anchor nodes suffices for graphs with thousands of nodes. The key is diversity of anchors — a single cluster of nearby anchors gives redundant information. Well-spread anchors maximize the structural information encoded.
A graph with multiple possible target sets. Select a strategy and see how each node's distance feature vector (shown as colored bars per node) changes. More diverse targets → more discriminative embeddings.
DE-GNN was evaluated on link prediction and node classification benchmarks. The gains are consistent and sometimes dramatic — especially on tasks where global structure matters.
| Task | Dataset | Metric | GIN | DE-GNN | Gain |
|---|---|---|---|---|---|
| Link prediction | Cora | AUC | 88.4% | 96.3% | +7.9% |
| Link prediction | Citeseer | AUC | 89.7% | 97.1% | +7.4% |
| Link prediction | Power (structural) | AUC | 63.1% | 91.2% | +28.1% |
| Node classif. | Cora | Accuracy | 81.5% | 83.9% | +2.4% |
| Subgraph counting | PATTERN | Accuracy | 85.2% | 92.4% | +7.2% |
The biggest gains appear on structural link prediction datasets (Power grid, USAir, etc.) where graph topology — not node features — determines link probability. These are exactly the cases where distance encoding provides new information unavailable to feature-only GNNs.
On node classification with rich node features (Cora, Citeseer), gains are smaller because the node features already partially disambiguate nodes. This is the expected pattern: distance encoding helps most when structural position is the primary signal, less when node features are already highly discriminative.
Distance encoding is often compared to positional encoding in Transformers — the technique of adding position information to token embeddings. The analogy is real but the implementations differ in important ways.
| Property | Transformer Positional Enc. | Distance Encoding (DE) |
|---|---|---|
| Position defined by | Index in sequence (1D, fixed) | Graph distance to target set (graph topology) |
| Inductive | Yes (within max length) | Yes (works on any graph) |
| Expressive | Breaks sequence symmetry | Breaks WL symmetry (provably) |
| Precomputation | O(1) or O(n) | O(N·|S|) BFS |
| Relative or absolute | Both exist (absolute PE, relative PE) | Relative to target set |
| Learned? | Often yes | Fixed (distances) + learned (GNN) |
More recent graph Transformers (like Graphormer, GPS-Transformer) have rediscovered this insight independently: they use shortest-path distances between all pairs of nodes as attention biases. This is essentially DE-GNN applied globally, with the full NxN distance matrix rather than distances to a small target set. The result is more powerful but O(N²) in memory — confirming that the target set abstraction is important for scalability.
Another comparison: Laplacian eigenvectors as positional encoding. Eigenvectors of the graph Laplacian provide smooth, global positional signals. They're coordinate-free (no need to choose target nodes) but not interpretable as distances, and they can be unstable (sign ambiguity). Distance encoding is discrete, interpretable, and task-specific — you choose targets based on the task at hand.
Distance encoding sits at the crossroads of expressiveness theory, structural graph learning, and positional representations. Understanding its connections places it precisely in the research landscape.
| Method | Position Signal | Expressive vs 1-WL | Task |
|---|---|---|---|
| GCN/GIN | None | ≤ 1-WL | General |
| SEAL | DRNL (pair distances) | > 1-WL | Link pred. |
| DE-GNN | Distance to target set | > 1-WL (proven) | General |
| ID-GNN | Binary ego flag | > 1-WL | General |
| Graphormer | Full pairwise distance | > 1-WL | General |
| k-WL GNN | k-tuples | > 1-WL | General |
Li, Wang, Wang, Leskovec. "Distance Encoding: Design Provably More Powerful GNNs." NeurIPS 2020. arXiv:2009.00142
"Distance is the one graph metric that encodes global position locally." — the paper's core intuition