Trouillon, Welbl, Riedel, Gaussier, Bouchard — ICML 2016 · arXiv 1606.06357

ComplEx: Breaking Symmetry with Complex Numbers

DistMult is beautiful but symmetric. One elegant fix: move embeddings into complex number space. Conjugation breaks the symmetry — and suddenly the model can handle both "friend-of" and "parent-of" with the same architecture.

Prerequisites: DistMult + complex numbers (a + bi). That's it.
8
Chapters
4+
Simulations
0
Assumed Knowledge

Chapter 0: DistMult's Blind Spot

DistMult scores a triple (h, r, t) as ⟨h, r, t⟩ = ∑i hi ri ti. Elegant. But there's a fatal flaw — swap h and t, and the score doesn't change. Mathematically:

⟨h, r, t⟩ = ⟨t, r, h⟩

This means DistMult predicts (France, capital-of, Paris) and (Paris, capital-of, France) with identical confidence. It cannot favor one direction over the other — ever, for any relation.

Yet the vast majority of real knowledge graph relations are antisymmetric: if (A, r, B) is true, then (B, r, A) is false. Parent-of, born-in, capital-of, employs, is-part-of. The list goes on. DistMult treats all of them as symmetric and thus fails systematically.

The Symmetry Problem

DistMult gives the same score to both orderings of every triple. ComplEx breaks this by using conjugation.

Mode: DistMult (symmetric)
How bad is this? FB15k has 1,345 relation types. A thorough analysis of real KG relations finds roughly 50% are antisymmetric (parent-of, born-in, etc.), 20% are symmetric (spouse-of, similar-to), and 30% are neither (neither direction is systematically true). DistMult can only model 20% of relations correctly in principle. The other 80% get learned with incorrect inductive bias.
Which of these relations CANNOT be correctly modeled by DistMult?

Chapter 1: Complex Embeddings

The ComplEx fix is to let all embeddings live in complex vector space ℂd instead of ℝd. Each embedding vector is now a vector of complex numbers — numbers of the form a + bi, where a is the real part and b is the imaginary part.

e ∈ ℂd, where each component ei = Re(ei) + i · Im(ei)

In practice, we store complex embeddings as two real-valued vectors: one for the real parts and one for the imaginary parts. No special complex-number library is needed. A d-dimensional complex embedding uses the same memory as a 2d-dimensional real embedding.

Why complex numbers? A complex number a + bi encodes both a magnitude and a phase angle (orientation). When you multiply two complex numbers, their magnitudes multiply and their angles add. This phase arithmetic is what allows ComplEx to distinguish (h, r, t) from (t, r, h) — the angles don't commute the same way.

The complex conjugate of z = a + bi is z̄ = a − bi. It flips the sign of the imaginary part. Geometrically, conjugation reflects the number across the real axis. This seemingly minor operation is what gives ComplEx the ability to break symmetry.

Complex Number Geometry

A complex number lives in 2D space. Its conjugate is its reflection. Drag the angle slider to rotate.

Angle 50°
What is the complex conjugate of z = 3 + 4i?

Chapter 2: The Hermitian Product

In real-valued DistMult, the score is ⟨h, r, t⟩ = ∑i hi ri ti. ComplEx extends this to complex space by using the Hermitian inner product — the complex analog of a dot product.

The ComplEx score for a triple (h, r, t) is:

fr(h, t) = Re(⟨h, r, t̄⟩) = Re( ∑i=1d hi ri ti )

Notice that t is conjugated (t̄) but h and r are not. We then take the real part Re(·) of the complex sum to get a real-valued score. This is crucial — we want a scalar score for ranking, not a complex number.

Interactive Hermitian Product

Set complex 2D embeddings for h, r, t. Watch how Re(⟨h, r, t̄⟩) changes — and how it differs from ⟨h, r, t⟩ when we use the conjugate.

h angle (°) 30°
r angle (°) 45°
t angle (°) 60°
Writing it out component-by-component: If hi = a + bi, ri = c + di, ti = e + fi, then t̄i = e − fi. The product hi rii is a complex number. Taking Re(·) of the sum gives a real scalar. In implementation: Re(⟨h, r, t̄⟩) = ⟨Re(h), Re(r), Re(t)⟩ + ⟨Im(h), Im(r), Re(t)⟩ + ⟨Re(h), Im(r), Im(t)⟩ − ⟨Im(h), Re(r), Im(t)⟩.
In the ComplEx scoring function Re(⟨h, r, t̄⟩), why do we conjugate t but not h or r?

Chapter 3: Why Conjugation Breaks Symmetry

Let's prove this carefully. With DistMult: fr(h, t) = ⟨h, r, t⟩ = ⟨t, r, h⟩ — always symmetric. With ComplEx, swapping h and t gives:

fr(t, h) = Re(⟨t, r, h̄⟩) = Re( ∑i ti ri hi )

Compare to the original:

fr(h, t) = Re( ∑i hi ri ti )

These are not equal in general. One can be shown to be the complex conjugate of the other: Re(⟨h, r, t̄⟩) = Re(⟨t, r, h̄⟩) only when all imaginary parts are zero (i.e., real-valued embeddings). Once embeddings have nonzero imaginary parts, fr(h, t) ≠ fr(t, h).

The magic of phase: Think of each dimension of an embedding as a point on the unit circle in the complex plane, with angle θ. The product hi · ri · t̄i has angle θ_h + θ_r − θ_t. When we swap h and t, the angle becomes θ_t + θ_r − θ_h. These are equal only when θ_h = θ_t. The phase difference θ_h − θ_t is what captures directional asymmetry.

Modeling symmetric vs antisymmetric

What makes ComplEx truly expressive is that it can model both:

Symmetric relations
Set all imaginary parts of r to zero: r ∈ ℝd. Then Re(⟨h, r, t̄⟩) = ⟨Re(h), r, Re(t)⟩ + ⟨Im(h), r, Im(t)⟩. Swapping h and t leaves both terms unchanged. Symmetric!
Antisymmetric relations
Use purely imaginary r: Im(r) ≠ 0. The phase rotation makes fr(h, t) = −fr(t, h). If (h, r, t) scores high, (t, r, h) scores equally low. Perfectly antisymmetric!
How does ComplEx model a symmetric relation (where f_r(h,t) = f_r(t,h) always)?

Chapter 4: Training

Training ComplEx follows the same recipe as DistMult — negative sampling with a log-likelihood loss — but working over complex embeddings. The key insight is that we never need to do any special complex-number operations at training time: we just store Re and Im parts as separate real-valued tensors.

python
import torch
import torch.nn as nn

class ComplEx(nn.Module):
    def __init__(self, n_entities, n_relations, dim):
        super().__init__()
        # Store real and imaginary parts separately
        self.e_re = nn.Embedding(n_entities, dim)
        self.e_im = nn.Embedding(n_entities, dim)
        self.r_re = nn.Embedding(n_relations, dim)
        self.r_im = nn.Embedding(n_relations, dim)

    def score(self, h_idx, r_idx, t_idx):
        # Fetch embeddings (real and imaginary parts)
        h_re, h_im = self.e_re(h_idx), self.e_im(h_idx)
        r_re, r_im = self.r_re(r_idx), self.r_im(r_idx)
        t_re, t_im = self.e_re(t_idx), self.e_im(t_idx)

        # Re(⟨h, r, t̄⟩) expanded component-wise
        # h*r*conj(t) = (h_re+i*h_im)(r_re+i*r_im)(t_re-i*t_im)
        score = (h_re * r_re * t_re
               + h_im * r_im * t_re
               + h_re * r_im * t_im
               - h_im * r_re * t_im)
        return score.sum(dim=-1)  # sum over embedding dim

    def forward(self, pos_h, pos_r, pos_t, neg_h, neg_r, neg_t):
        pos = self.score(pos_h, pos_r, pos_t)
        neg = self.score(neg_h, neg_r, neg_t)
        loss = -torch.log(torch.sigmoid(pos)).mean()
        loss += -torch.log(torch.sigmoid(-neg)).mean()
        return loss
Key implementation detail: The expansion Re(h·r·t̄) = h_re·r_re·t_re + h_im·r_im·t_re + h_re·r_im·t_im − h_im·r_re·t_im is exactly 4 Hadamard products and 1 sign flip. No complex arithmetic library needed — it's just 4 element-wise multiplications on real tensors. This makes ComplEx nearly as fast as DistMult to train.

Regularization

The authors use N3 regularization (nuclear 3-norm), which penalizes the cube of the magnitude of each embedding:

||e||N3 = ∑i (Re(ei)2 + Im(ei)2)3/2

This keeps embedding magnitudes controlled and prevents the model from memorizing the training set through large-magnitude embeddings.

How many real-valued Hadamard (element-wise) products does computing one ComplEx score require?

Chapter 5: Results

Trouillon et al. evaluated ComplEx on WN18 and FB15k (same benchmarks as DistMult) plus WN18RR and FB15k-237 — modified versions that remove inverse relations to prevent models from "cheating" by memorizing relation pairs.

ModelWN18 H@10FB15k H@10MRR
TransE89.2%47.1%
DistMult94.2%57.7%0.830
ComplEx94.7%84.0%0.941
The FB15k jump is striking: DistMult gets 57.7% on FB15k. ComplEx gets 84.0% — a 26-point absolute improvement. The difference is almost entirely from FB15k's many antisymmetric relations (national-team-of, employer-of, country-capital, etc.) that DistMult systematically fails on. WN18 doesn't improve as much because WN18's relations are already more symmetric (hypernym/hyponym are technically antisymmetric but WN18 includes both directions explicitly).

MRR (Mean Reciprocal Rank) is the average of 1/rank across all test queries. If the true answer is ranked first, it contributes 1.0. Second place: 0.5. Third: 0.33. ComplEx's MRR of 0.941 on WN18 means the true answer is nearly always ranked first or second.

Why does ComplEx improve so much more on FB15k (26 points) than on WN18 (only 0.5 points) compared to DistMult?

Chapter 6: ComplEx vs DistMult vs TransE

Now we have three models in hand. Understanding their differences helps you choose the right tool for any knowledge graph problem.

PropertyTransEDistMultComplEx
Score function−||h+r−t||⟨h,r,t⟩Re(⟨h,r,t̄⟩)
Embedding spaceddd
Symmetric relationsFailsNaturalHandles (Im(r)=0)
AntisymmetricHandlesCannotHandles (Im(r)≠0)
Inversion (r⁻¹)HandlesCannotHandles
CompositionHandlesCannotLimited
Params per relationdd2d
ComplEx strictly generalizes DistMult: Set all imaginary parts of entity and relation embeddings to zero. ComplEx's score becomes Re(⟨h, r, t̄⟩) = ⟨Re(h), Re(r), Re(t)⟩ — exactly DistMult. ComplEx can represent any distribution that DistMult can, plus distributions that require asymmetry. It's a strict superset.

When does the extra complexity pay off?

The 2x parameter cost (storing Re and Im) is justified when the knowledge graph has substantial antisymmetric structure. If your graph is mostly symmetric (social networks with mutual connections, synonym graphs), DistMult is probably sufficient. If your graph mixes both — like Freebase — ComplEx is the right default starting point.

ComplEx is said to "strictly generalize" DistMult. What does this mean?

Chapter 7: Connections

ComplEx established complex-valued embeddings as a core tool in knowledge graph representation learning. It directly inspired a line of work exploring richer algebraic structures.

ModelKey ExtensionRelationship to ComplEx
RotatERelations as complex rotations (unit modulus)Restricts r to |r_i|=1, adds composition
QuatEQuaternion embeddings (4D complex)Extends to Hamilton product over ℍ
AnalogyUnifies ComplEx + DistMult + analogy completionTheoretical unification
Tucker/TuckERTucker tensor decompositionMore general bilinear form
DURADual relation regularizationBetter regularization for ComplEx
Tensor decomposition perspective: DistMult is a CP decomposition of the KG tensor X[h,r,t]. ComplEx is a CP decomposition of X in complex space — equivalent to a real-valued decomposition with doubled dimensions but with specific structured constraints. The theory of tensor decompositions predicts exactly when each factorization can represent a given relation pattern.

The expressive hierarchy

From least to most expressive for relation patterns:

TransE
Only antisymmetric, 1-to-1 relations
↓ more expressive
DistMult
Only symmetric relations
↓ strictly more expressive
ComplEx
Symmetric + antisymmetric + inversion
↓ strictly more expressive
RotatE
+ composition patterns
Closing thought: ComplEx's key insight — that going to complex space is free in terms of time complexity but gains you the ability to handle asymmetry — is an instance of a broader principle: choosing the right algebraic structure for your data's geometry. Knowledge graphs have asymmetric structure (directionality); complex numbers provide asymmetric algebra (conjugation). Match the math to the geometry of the problem.