Distance Encoding

Chapter 0: The Problem

You're building a drug discovery pipeline. Your molecules are graphs — atoms are nodes, bonds are edges. You want to predict which atoms are part of a specific reactive site. The catch: many atoms are structurally identical within the molecule. Carbon atoms bonded to two hydrogens appear in dozens of positions. Standard GNNs assign them the same embedding. Your model has no way to tell them apart.

This isn't just a molecule problem. In social networks, you want to know if two specific users are likely to connect. In knowledge graphs, you need to know how a query node relates to the entities it might link to. In road networks, you need each node to know how far it is from key hubs. In all these cases, the question isn't "what does this node's neighborhood look like?" but "where is this node relative to something that matters?"

The root cause: Standard GNNs compute node embeddings by aggregating neighborhood information. Two nodes with identical k-hop neighborhoods get identical embeddings — always. This is the 1-Weisfeiler-Lehman (1-WL) ceiling. Position in the global graph structure is invisible to these models. They can describe the local texture of a neighborhood but cannot answer "where am I?"

The distance encoding paper asks a precise question: what is the minimal structural information you can add to a GNN to make it provably more expressive? The answer is elegant: distance to a set of target nodes. Give every node a feature vector encoding how far it is from each target node in the set S. This one addition breaks all 1-WL symmetries that prevent graph-level and pair-level structural distinction.

Distance is not arbitrary. Shortest-path distance between nodes is a canonical, interpretable, globally consistent structural signal. It answers "where am I relative to what matters?" — the question standard GNNs cannot ask.

The Structural Blind Spot

Two graphs that look different globally but have identical local structures at every node. A standard GNN assigns the same embeddings — it cannot tell the graphs apart. Click nodes to see their local neighborhoods.

What question can distance encoding answer that standard GNNs cannot?

"Where is this node relative to specific target nodes in the graph?" — a global structural position that aggregation-based GNNs cannot compute "What is the degree of each node's neighbors?" — local structural information "What is the most common feature among a node's neighbors?" — aggregation statistics

Chapter 1: Distance as Feature

The core idea is simple enough to state in one sentence: for each node v in the graph, compute its shortest-path distance to every node in a designated set S of "target nodes," and append that distance vector to v's feature vector before running the GNN.

x_v^DE = [x_v ‖ d(v, s₁) ‖ d(v, s₂) ‖ ... ‖ d(v, s_|S|)]

Here, d(v, s_i) is the shortest-path distance from node v to target node s_i. The augmented feature x_v^DE is then fed into any standard GNN — GCN, GAT, GIN, whatever you prefer. No architecture change is needed. Distance encoding is a preprocessing step that makes any existing GNN more powerful.

Why shortest-path distance? Distance is the fundamental metric on graphs — the analogue of Euclidean distance in space. It satisfies the triangle inequality, is symmetric, and uniquely characterizes how "close" two nodes are in the graph topology. It also encodes cycle information: if d(u, v) = k, then u and v participate in a shared cycle of length ≥ 2k. Distance is not just a feature — it is structural position itself.

Distance must be encoded, not used raw. A distance of 5 is meaningfully different from a distance of 6 in a sparse graph; but in a dense graph, both might be "far." Two encodings are proposed:

SPD (Shortest Path Distance): Direct use of integer distance values. Simple, interpretable, but doesn't weight different distances differently.
MSD (Mean of Short random walk Distributions): The expected number of steps to reach the target in a random walk. This captures not just the shortest path but all paths, weighting shorter ones more heavily. More informative but more expensive to compute.

Distance Vectors

Click a target node (orange star). Each node's color shows its distance to the target. Hover over nodes to see their distance feature vector.

What does the distance encoding vector for a node v contain?

The shortest-path distance from v to each node in the target set S, appended to v's existing feature vector A random walk embedding of fixed dimensionality The adjacency row of v padded to a fixed size

Chapter 2: DE-GNN — The Showcase

DE-GNN is the complete pipeline: distance encoding preprocessing followed by a standard GNN. For link prediction between nodes u and v, the target set is S = {u, v}. For node classification, S might be a fixed set of "anchor" nodes. For graph classification, S is the entire node set (or a sampled subset).

Link Prediction: The Key Use Case

For predicting whether nodes u and v will link, DE-GNN extracts a subgraph around the pair and augments each node with its distances to u and v:

x_w^DE = [x_w ‖ d(w, u) ‖ d(w, v)] ∀ w in subgraph

Now every node "knows" how close it is to both endpoints. The GNN aggregates this information, and the final embeddings of u and v are combined to score the link. The key insight: even if u and v are structurally symmetric (same local neighborhoods), their distance vectors to themselves are always distinct — d(u, u) = 0 ≠ d(v, u) in general.

Connection to SEAL: DE-GNN with S = {u, v} and the MSD encoding is provably equivalent to SEAL's Double-Radius Node Labeling (DRNL) in terms of expressiveness. SEAL labels each node with its (d(w,u), d(w,v)) pair; DE-GNN uses these as continuous features. DE-GNN subsumes SEAL's labeling as a special case, and adds more via the MSD encoding.

DE-GNN for Link Prediction — Interactive

Select two nodes (source = orange, target = teal). The color intensity of each node shows its distance feature. Adjust GNN layers to see how information propagates. The link score bar updates in real time.

GNN layers 2

For link prediction between nodes u and v, what is the target set S in DE-GNN?

S = {u, v} — the two endpoint nodes; every other node gets features encoding its distance to both endpoints S = all nodes in the graph; distances form a full distance matrix S = the neighbors of u; giving u's neighborhood context to all nodes

Chapter 3: Expressiveness Proof

The paper proves that DE-GNN is strictly more powerful than 1-WL GNNs. The proof proceeds by showing two things: (1) if 1-WL assigns identical colors to two graphs, DE-GNN can still distinguish them; (2) DE-GNN can detect certain graph properties — like cycle lengths — that 1-WL cannot.

The Key Theorem

Let G and G' be two graphs that are 1-WL indistinguishable (i.e., any standard GNN gives them the same representation). Then there exist graphs G'' such that DE-GNN distinguishes G from G'' even though 1-WL cannot.

Intuition for the proof: 1-WL fails because it cannot detect whether a node participates in a cycle. Consider a 6-cycle vs. two triangles sharing a node. To every node, the k-hop neighborhood looks the same. But if you pick any two nodes s, t as targets, their pairwise distances are different in the two graphs. The distance encoding breaks the symmetry that 1-WL cannot.

More formally, the paper shows that DE-GNN can compute any function that depends on the distance distribution between a target set S and the rest of the graph. This includes:

Whether two target nodes are in the same connected component
The length of the shortest cycle containing both targets
The number of paths of length k between any two targets
The diameter of the graph (with appropriate target selection)

None of these are computable by standard GNNs. The proof is constructive: for each property, a DE-GNN is explicitly constructed that computes it correctly.

Expressiveness: 1-WL-GNN ⊊ DE-GNN

The "strictly less than" (⊊) is key — DE-GNN is not just different, it is provably more expressive in the formal sense.

What kind of structural property can DE-GNN compute that 1-WL GNNs cannot?

The degree of a node's neighbors The sum of all node features in the graph The length of the shortest cycle containing two specific target nodes, or the number of paths of length k between them

Chapter 4: Choosing the Target Set S

Distance encoding is parameterized by the choice of target set S. Different tasks call for different strategies:

Task	Target Set S	Rationale
Link prediction (u,v)	{u, v}	Encode structural position relative to both endpoints
Node classification	All labeled nodes	Each node knows its distance to the training signal sources
Graph classification	All nodes (or sample)	Global structural fingerprint
Community detection	Community seeds	Soft community membership via distance to seeds

The anchor node strategy: For large graphs, using all nodes as targets is O(N²) in memory — infeasible. The solution: sample a fixed set of k "anchor" nodes and compute distances only to those k nodes. Each node gets a k-dimensional distance feature vector. If anchors are chosen to span the graph (e.g., high-betweenness centrality nodes or random samples), they provide good coverage of graph structure. This is the same idea as landmark-based shortest path algorithms in computational geometry.

How many targets are needed? Empirically, |S| = 10 to 50 anchor nodes suffices for graphs with thousands of nodes. The key is diversity of anchors — a single cluster of nearby anchors gives redundant information. Well-spread anchors maximize the structural information encoded.

Target Set Exploration

A graph with multiple possible target sets. Select a strategy and see how each node's distance feature vector (shown as colored bars per node) changes. More diverse targets → more discriminative embeddings.

Targets (k) 2

For large graphs, why can't you use all nodes as target set S?

Computing and storing all-pairs shortest-path distances requires O(N²) space — infeasible for graphs with millions of nodes. A small set of diverse anchor nodes gives nearly as much structural information at O(N·k) cost. GNNs cannot process feature vectors longer than 128 dimensions Distance information degrades when more than 10 targets are used

Chapter 5: Results

DE-GNN was evaluated on link prediction and node classification benchmarks. The gains are consistent and sometimes dramatic — especially on tasks where global structure matters.

Task	Dataset	Metric	GIN	DE-GNN	Gain
Link prediction	Cora	AUC	88.4%	96.3%	+7.9%
Link prediction	Citeseer	AUC	89.7%	97.1%	+7.4%
Link prediction	Power (structural)	AUC	63.1%	91.2%	+28.1%
Node classif.	Cora	Accuracy	81.5%	83.9%	+2.4%
Subgraph counting	PATTERN	Accuracy	85.2%	92.4%	+7.2%

The biggest gains appear on structural link prediction datasets (Power grid, USAir, etc.) where graph topology — not node features — determines link probability. These are exactly the cases where distance encoding provides new information unavailable to feature-only GNNs.

Comparison with SEAL: DE-GNN matches or exceeds SEAL (the prior state-of-the-art for link prediction) on most benchmarks, while being more interpretable and applicable to tasks beyond link prediction. SEAL requires extracting a subgraph for every candidate pair at test time — O(|E|) subgraphs. DE-GNN precomputes distances once and applies to all tasks. The precomputation pays off at scale.

On node classification with rich node features (Cora, Citeseer), gains are smaller because the node features already partially disambiguate nodes. This is the expected pattern: distance encoding helps most when structural position is the primary signal, less when node features are already highly discriminative.

Why does DE-GNN gain the most on structural link prediction datasets like Power grid?

These graphs have no node features — structure is the only signal. Standard GNNs, limited to local aggregation, cannot capture the global structural position that determines link probability. Distance encoding directly provides this. Power grid graphs have very high average node degree, which helps distance encoding DE-GNN uses a larger model than GIN on structural datasets

Chapter 6: vs Positional Encoding

Distance encoding is often compared to positional encoding in Transformers — the technique of adding position information to token embeddings. The analogy is real but the implementations differ in important ways.

Property	Transformer Positional Enc.	Distance Encoding (DE)
Position defined by	Index in sequence (1D, fixed)	Graph distance to target set (graph topology)
Inductive	Yes (within max length)	Yes (works on any graph)
Expressive	Breaks sequence symmetry	Breaks WL symmetry (provably)
Precomputation	O(1) or O(n)	O(N·\|S\|) BFS
Relative or absolute	Both exist (absolute PE, relative PE)	Relative to target set
Learned?	Often yes	Fixed (distances) + learned (GNN)

The deeper analogy: In a Transformer, positional encoding tells each token "you are in position 7 of the sequence." In DE-GNN, distance encoding tells each node "you are 3 hops from target u and 5 hops from target v." Both convert a permutation-invariant architecture (attention / message passing) into one that is aware of structural position. The difference is that graph position is defined by topology, not by a linear ordering — and that's exactly what shortest-path distance captures.

More recent graph Transformers (like Graphormer, GPS-Transformer) have rediscovered this insight independently: they use shortest-path distances between all pairs of nodes as attention biases. This is essentially DE-GNN applied globally, with the full NxN distance matrix rather than distances to a small target set. The result is more powerful but O(N²) in memory — confirming that the target set abstraction is important for scalability.

Another comparison: Laplacian eigenvectors as positional encoding. Eigenvectors of the graph Laplacian provide smooth, global positional signals. They're coordinate-free (no need to choose target nodes) but not interpretable as distances, and they can be unstable (sign ambiguity). Distance encoding is discrete, interpretable, and task-specific — you choose targets based on the task at hand.

How is distance encoding in DE-GNN analogous to positional encoding in Transformers?

Both convert a permutation-invariant architecture into one that is aware of structural position — Transformers use sequence index, DE-GNN uses graph distance to target nodes Both use sinusoidal functions to encode position Both require the full distance matrix to be computed and stored

Chapter 7: Connections

Distance encoding sits at the crossroads of expressiveness theory, structural graph learning, and positional representations. Understanding its connections places it precisely in the research landscape.

Method	Position Signal	Expressive vs 1-WL	Task
GCN/GIN	None	≤ 1-WL	General
SEAL	DRNL (pair distances)	> 1-WL	Link pred.
DE-GNN	Distance to target set	> 1-WL (proven)	General
ID-GNN	Binary ego flag	> 1-WL	General
Graphormer	Full pairwise distance	> 1-WL	General
k-WL GNN	k-tuples	> 1-WL	General

Subsumption relationships: DE-GNN with S = all nodes subsumes SEAL's DRNL labeling. ID-GNN's binary flag is equivalent to DE-GNN with |S| = 1 and a threshold encoding (distance 0 or not). Graphormer's spatial encoding is DE-GNN with S = all nodes and direct distance biases in attention. The key contribution of the distance encoding paper is proving the expressiveness separation formally and providing a principled framework for choosing S based on the task.

Limitations: BFS distance computation is O(N+E) per target node — O(N·|S|) total. For dense graphs (E = O(N²)) or large target sets, this is expensive. The MSD encoding (random walk distances) is even more costly. For very large graphs (social networks with 10M+ nodes), approximate distances (landmark algorithms, Bourgain embedding) are necessary. Also, DE-GNN assumes distances are informative — in some synthetic tasks, the graph distance is not the relevant structural property, and other encodings might work better.

Key Paper

Li, Wang, Wang, Leskovec. "Distance Encoding: Design Provably More Powerful GNNs." NeurIPS 2020. arXiv:2009.00142

"Distance is the one graph metric that encodes global position locally." — the paper's core intuition

Distance Encoding: More Powerful GNNs