GNNs can't tell two nodes apart if they look the same. ID-GNN fixes this by giving every node a unique identity signal — no extra parameters, just a provably more expressive message-passing scheme.
Imagine a graph shaped like a perfect 6-cycle: nodes A–B–C–D–E–F connected in a ring. Ask a standard GNN to compute an embedding for node A. It collects messages from B and F (its two neighbors), averages them, passes them up. Now do the same for node D. D collects messages from C and E, averages them, passes them up.
The result? A and D get identical embeddings. Of course they do — they have the same number of neighbors, the same local structure, the same everything. The GNN has no way to know it is operating on A versus D. As far as it is concerned, they are the same node.
Why does this matter? Many graph tasks depend on node identity — who you are, not just what your neighborhood looks like. Consider:
The naive fix is to give every node a random unique ID as an extra feature. That works — but it breaks inductive generalization. If you train on graph G and test on graph G', the IDs from G mean nothing on G'. You need identity signals that are meaningful without being arbitrary.
A 6-cycle. Click any node to highlight it and see its 1-hop neighborhood. Notice that all nodes have identical neighborhoods — the GNN cannot distinguish them.
Graph theorists have studied the problem of distinguishing nodes for a century. The classic solution is graph coloring: assign a label (color) to each node such that different nodes get different labels if they play structurally different roles. The Weisfeiler-Lehman algorithm does exactly this — iteratively refining color assignments based on neighborhood colors until stable.
The 1-WL algorithm works like this: start with all nodes having the same color (say, "gray"). Each round, every node collects the colors of its neighbors, hashes them together with its own color, and gets a new color. Repeat until no colors change. Two nodes that end up with the same color are structurally equivalent from 1-WL's perspective.
What breaks 1-WL? The 6-cycle is a perfect example. After any number of rounds, all 6 nodes remain the same color because every node has exactly two neighbors with the same color as each other. 1-WL cannot distinguish nodes that are in globally symmetric positions.
The fix ID-GNN proposes: heterogeneous coloring. Instead of starting with the same color for all nodes, give the "target node" (the node we're computing an embedding for) a special color — call it red. All other nodes stay gray. Now rerun the propagation. The red color propagates outward, creating a gradient of distinctiveness that tells each node exactly how far it is from the target.
Click any node to make it the "target." Watch how the coloring propagates outward, making every node's position unique relative to the target.
ID-GNN turns heterogeneous coloring into a concrete, differentiable GNN algorithm. The key idea: to compute node v's embedding, run a standard GNN on the entire graph — but mark v with an extra binary feature (identity flag = 1; all others = 0). This forces the GNN to produce a distinct output for v even if v is structurally symmetric with other nodes.
Formally, for target node v, augment each node u's feature vector:
Then run the standard GNN on these augmented features. The resulting embedding hv(L) is the identity-aware embedding of v. The binary flag is the minimal identity signal — just one extra bit per node, but it breaks all symmetries in the graph.
Select a target node. Adjust GNN depth to see how the identity signal propagates. The embedding bars show how each node's representation changes — the target's embedding is always unique.
Running the full graph GNN N times is expensive. But here's the key observation: when computing the embedding of node v using an L-layer GNN, only nodes within L hops of v matter. Everything further away is outside v's receptive field and contributes nothing to hv(L).
This motivates ego-network extraction: for target node v, extract the subgraph induced by all nodes within L hops of v. This is the L-hop ego-network of v, centered on v. Now run the augmented GNN on this much smaller subgraph.
The savings are dramatic in sparse graphs. If the average degree is d and you use L=3 layers, the ego-network has at most 1 + d + d² + d³ nodes. For a social network with d=10 and L=3, that is at most 1111 nodes — far smaller than a million-node graph.
For link prediction, you need embeddings for pairs (u, v). The ego-network approach means: extract the ego-network centered on u (to compute hu) and the ego-network centered on v (to compute hv), then score the pair. The computation is embarrassingly parallel — all ego-networks can be processed simultaneously in a batch.
A larger graph. The highlighted region is the L-hop ego-network for the selected target node. Adjust L to see how many nodes are included.
The central challenge for any graph identity scheme: it must work inductively. Training on one graph, testing on another. If you assign each node a fixed ID (node 1, node 2, ...), that ID is meaningless on a new graph. Node 47 in the training graph has no relationship to node 47 in the test graph.
ID-GNN's approach is inherently inductive because the identity signal is relative, not absolute. The binary flag "am I the target?" is defined by the computation task, not by a global node ordering. When you run ID-GNN on a new graph, you simply mark whichever node you're computing for — the flag is always binary and always locally meaningful.
This also means ID-GNN learns how to USE identity signals, not what the signals mean globally. The GNN learns: "if the identity flag is 1 nearby, here's how to weight that information." This learned strategy transfers perfectly to new graphs.
Compare to random node features (RNF) — another approach where each node gets a random d-dimensional vector as extra features. RNF also breaks symmetry, but the features are random at test time and random at training time — the model cannot learn stable patterns. ID-GNN's binary flag is deterministic and semantically clear: it encodes distance to the target. RNF is noise; ID-GNN is information.
python # ID-GNN: inductive node embedding for target node v def id_gnn_embed(graph, node_features, target_v, gnn_model): # Step 1: Extract ego-network (L-hop subgraph around target_v) ego_graph, ego_mask = extract_ego(graph, target_v, L=gnn_model.num_layers) # Step 2: Create identity flag — 1 for target, 0 for all others id_flag = (ego_graph.nodes == target_v).float().unsqueeze(-1) # [N_ego, 1] # Step 3: Augment node features with identity flag aug_features = torch.cat([node_features[ego_mask], id_flag], dim=-1) # [N_ego, d+1] # Step 4: Run standard GNN on augmented ego-graph embeddings = gnn_model(ego_graph, aug_features) # [N_ego, h] # Step 5: Return embedding of the target node target_idx = (ego_graph.nodes == target_v).nonzero()[0] return embeddings[target_idx] # [h] — unique to v, inductive
ID-GNN was evaluated on node classification, link prediction, and graph classification tasks. The key claim: adding identity information strictly improves GNN expressiveness, and this shows in practice.
| Task | Dataset | Metric | GIN | ID-GNN | Gain |
|---|---|---|---|---|---|
| Link prediction | ogbl-collab | Hits@50 | 51.4% | 62.6% | +11.2% |
| Link prediction | ogbl-ddi | Hits@20 | 37.1% | 55.9% | +18.8% |
| Graph classification | EXP (WL-hard) | Accuracy | 50.0% | 100.0% | +50.0% |
| Graph classification | CEXP | Accuracy | 50.0% | 100.0% | +50.0% |
| Node classification | ogbn-arxiv | Accuracy | 71.7% | 72.4% | +0.7% |
The most dramatic improvements are on the EXP and CEXP datasets — synthetic graphs specifically designed to be hard for 1-WL tests. Standard GNNs score exactly 50% (random chance) because they cannot distinguish any nodes. ID-GNN scores 100%: the identity signal is exactly what's needed to break the 1-WL limitation on these graphs.
Where does ID-GNN NOT help much? Node classification tasks where nodes are already distinguishable by their features (text, chemistry). If node A has a unique feature vector, the standard GNN can already tell it apart from node D without any identity signal. The gains are largest where graph structure is the primary signal and features are homogeneous.
Graph Isomorphism Network (GIN) is the most expressive 1-WL GNN — it achieves the theoretical maximum expressiveness within the 1-WL class by using injective (sum) aggregation with MLP transformation. GIN is the strongest baseline ID-GNN needs to beat.
The GIN update rule:
The sum aggregation (not mean, not max) is key — it preserves multiset information. Two neighborhoods with different numbers of copies of the same node feature produce different sums. GIN is as expressive as 1-WL can be.
| Property | GIN (1-WL) | ID-GNN |
|---|---|---|
| Node degree | Yes | Yes |
| Triangle membership | No | Yes |
| Cycle length | Partially | Yes |
| Shortest path between pair | No | Yes |
| Graph diameter | No | Yes |
| Inductive to new graphs | Yes | Yes |
| Extra parameters needed | No | No |
ID-GNN matches GIN on everything GIN can do, and adds strictly more. The cost: O(N) ego-network computations instead of O(1). ID-GNN-Fast recovers much of the efficiency by processing all ego-networks in a single batched forward pass, achieving roughly 2-4x runtime overhead over standard GNN — acceptable for the expressiveness gain.
ID-GNN sits at the intersection of several active research threads. Understanding these connections helps place it in the broader landscape of expressive graph learning.
| Method | Identity Signal | Inductive? | Extra Cost |
|---|---|---|---|
| Standard GNN (GCN, GIN) | None | Yes | None |
| Random Node Features | Random d-dim vector | No | +d params |
| Distance Encoding (DE) | Distance to target set | Yes | Medium |
| ID-GNN | Binary ego flag | Yes | O(N) passes |
| SEAL (subgraph) | DRNL node labeling | Yes | O(N) subgraphs |
| k-WL GNNs | k-tuple node features | Yes | O(Nk) |
On the theoretical side, ID-GNN is related to the folklore Weisfeiler-Lehman (FWL) and k-WL hierarchy. While higher-order WL tests achieve greater expressiveness, they require processing k-tuples of nodes — exponential in k. ID-GNN achieves a sweet spot: provably more expressive than 1-WL without exponential cost.
You, Gomes-Selman, Ying, Leskovec. "Identity-aware Graph Neural Networks." AAAI 2021. arXiv:2101.10320
"What a neural network cannot distinguish, it cannot reason about." — paraphrase of the WL theorem's lesson