DistMult is beautiful but symmetric. One elegant fix: move embeddings into complex number space. Conjugation breaks the symmetry — and suddenly the model can handle both "friend-of" and "parent-of" with the same architecture.
DistMult scores a triple (h, r, t) as 〈h, r, t〉 = ∑i hi ri ti. Elegant. But there's a fatal flaw — swap h and t, and the score doesn't change. Mathematically:
This means DistMult predicts (France, capital-of, Paris) and (Paris, capital-of, France) with identical confidence. It cannot favor one direction over the other — ever, for any relation.
Yet the vast majority of real knowledge graph relations are antisymmetric: if (A, r, B) is true, then (B, r, A) is false. Parent-of, born-in, capital-of, employs, is-part-of. The list goes on. DistMult treats all of them as symmetric and thus fails systematically.
DistMult gives the same score to both orderings of every triple. ComplEx breaks this by using conjugation.
Mode: DistMult (symmetric)The ComplEx fix is to let all embeddings live in complex vector space ℂd instead of ℝd. Each embedding vector is now a vector of complex numbers — numbers of the form a + bi, where a is the real part and b is the imaginary part.
In practice, we store complex embeddings as two real-valued vectors: one for the real parts and one for the imaginary parts. No special complex-number library is needed. A d-dimensional complex embedding uses the same memory as a 2d-dimensional real embedding.
The complex conjugate of z = a + bi is z̄ = a − bi. It flips the sign of the imaginary part. Geometrically, conjugation reflects the number across the real axis. This seemingly minor operation is what gives ComplEx the ability to break symmetry.
A complex number lives in 2D space. Its conjugate is its reflection. Drag the angle slider to rotate.
In real-valued DistMult, the score is 〈h, r, t〉 = ∑i hi ri ti. ComplEx extends this to complex space by using the Hermitian inner product — the complex analog of a dot product.
The ComplEx score for a triple (h, r, t) is:
Notice that t is conjugated (t̄) but h and r are not. We then take the real part Re(·) of the complex sum to get a real-valued score. This is crucial — we want a scalar score for ranking, not a complex number.
Set complex 2D embeddings for h, r, t. Watch how Re(〈h, r, t̄〉) changes — and how it differs from 〈h, r, t〉 when we use the conjugate.
Let's prove this carefully. With DistMult: fr(h, t) = 〈h, r, t〉 = 〈t, r, h〉 — always symmetric. With ComplEx, swapping h and t gives:
Compare to the original:
These are not equal in general. One can be shown to be the complex conjugate of the other: Re(〈h, r, t̄〉) = Re(〈t, r, h̄〉) only when all imaginary parts are zero (i.e., real-valued embeddings). Once embeddings have nonzero imaginary parts, fr(h, t) ≠ fr(t, h).
What makes ComplEx truly expressive is that it can model both:
Training ComplEx follows the same recipe as DistMult — negative sampling with a log-likelihood loss — but working over complex embeddings. The key insight is that we never need to do any special complex-number operations at training time: we just store Re and Im parts as separate real-valued tensors.
python import torch import torch.nn as nn class ComplEx(nn.Module): def __init__(self, n_entities, n_relations, dim): super().__init__() # Store real and imaginary parts separately self.e_re = nn.Embedding(n_entities, dim) self.e_im = nn.Embedding(n_entities, dim) self.r_re = nn.Embedding(n_relations, dim) self.r_im = nn.Embedding(n_relations, dim) def score(self, h_idx, r_idx, t_idx): # Fetch embeddings (real and imaginary parts) h_re, h_im = self.e_re(h_idx), self.e_im(h_idx) r_re, r_im = self.r_re(r_idx), self.r_im(r_idx) t_re, t_im = self.e_re(t_idx), self.e_im(t_idx) # Re(⟨h, r, t̄⟩) expanded component-wise # h*r*conj(t) = (h_re+i*h_im)(r_re+i*r_im)(t_re-i*t_im) score = (h_re * r_re * t_re + h_im * r_im * t_re + h_re * r_im * t_im - h_im * r_re * t_im) return score.sum(dim=-1) # sum over embedding dim def forward(self, pos_h, pos_r, pos_t, neg_h, neg_r, neg_t): pos = self.score(pos_h, pos_r, pos_t) neg = self.score(neg_h, neg_r, neg_t) loss = -torch.log(torch.sigmoid(pos)).mean() loss += -torch.log(torch.sigmoid(-neg)).mean() return loss
The authors use N3 regularization (nuclear 3-norm), which penalizes the cube of the magnitude of each embedding:
This keeps embedding magnitudes controlled and prevents the model from memorizing the training set through large-magnitude embeddings.
Trouillon et al. evaluated ComplEx on WN18 and FB15k (same benchmarks as DistMult) plus WN18RR and FB15k-237 — modified versions that remove inverse relations to prevent models from "cheating" by memorizing relation pairs.
| Model | WN18 H@10 | FB15k H@10 | MRR |
|---|---|---|---|
| TransE | 89.2% | 47.1% | — |
| DistMult | 94.2% | 57.7% | 0.830 |
| ComplEx | 94.7% | 84.0% | 0.941 |
MRR (Mean Reciprocal Rank) is the average of 1/rank across all test queries. If the true answer is ranked first, it contributes 1.0. Second place: 0.5. Third: 0.33. ComplEx's MRR of 0.941 on WN18 means the true answer is nearly always ranked first or second.
Now we have three models in hand. Understanding their differences helps you choose the right tool for any knowledge graph problem.
| Property | TransE | DistMult | ComplEx |
|---|---|---|---|
| Score function | −||h+r−t|| | 〈h,r,t〉 | Re(〈h,r,t̄〉) |
| Embedding space | ℝd | ℝd | ℂd |
| Symmetric relations | Fails | Natural | Handles (Im(r)=0) |
| Antisymmetric | Handles | Cannot | Handles (Im(r)≠0) |
| Inversion (r⁻¹) | Handles | Cannot | Handles |
| Composition | Handles | Cannot | Limited |
| Params per relation | d | d | 2d |
The 2x parameter cost (storing Re and Im) is justified when the knowledge graph has substantial antisymmetric structure. If your graph is mostly symmetric (social networks with mutual connections, synonym graphs), DistMult is probably sufficient. If your graph mixes both — like Freebase — ComplEx is the right default starting point.
ComplEx established complex-valued embeddings as a core tool in knowledge graph representation learning. It directly inspired a line of work exploring richer algebraic structures.
| Model | Key Extension | Relationship to ComplEx |
|---|---|---|
| RotatE | Relations as complex rotations (unit modulus) | Restricts r to |r_i|=1, adds composition |
| QuatE | Quaternion embeddings (4D complex) | Extends to Hamilton product over ℍ |
| Analogy | Unifies ComplEx + DistMult + analogy completion | Theoretical unification |
| Tucker/TuckER | Tucker tensor decomposition | More general bilinear form |
| DURA | Dual relation regularization | Better regularization for ComplEx |
From least to most expressive for relation patterns: