Trouillon, Welbl, Riedel, Gaussier, Bouchard — ICML 2016 · arXiv 1606.06357

ComplEx: Breaking Symmetry with Complex Numbers

DistMult is beautiful but symmetric. One elegant fix: move embeddings into complex number space. Conjugation breaks the symmetry — and suddenly the model can handle both "friend-of" and "parent-of" with the same architecture.

Prerequisites: DistMult + complex numbers (a + bi). That's it.

Chapters

Simulations

Assumed Knowledge

Chapter 0: DistMult's Blind Spot

DistMult scores a triple (h, r, t) as ⟨h, r, t⟩ = ∑_i h_i r_i t_i. Elegant. But there's a fatal flaw — swap h and t, and the score doesn't change. Mathematically:

⟨h, r, t⟩ = ⟨t, r, h⟩

This means DistMult predicts (France, capital-of, Paris) and (Paris, capital-of, France) with identical confidence. It cannot favor one direction over the other — ever, for any relation.

Yet the vast majority of real knowledge graph relations are antisymmetric: if (A, r, B) is true, then (B, r, A) is false. Parent-of, born-in, capital-of, employs, is-part-of. The list goes on. DistMult treats all of them as symmetric and thus fails systematically.

The Symmetry Problem

DistMult gives the same score to both orderings of every triple. ComplEx breaks this by using conjugation.

Mode: DistMult (symmetric)

How bad is this? FB15k has 1,345 relation types. A thorough analysis of real KG relations finds roughly 50% are antisymmetric (parent-of, born-in, etc.), 20% are symmetric (spouse-of, similar-to), and 30% are neither (neither direction is systematically true). DistMult can only model 20% of relations correctly in principle. The other 80% get learned with incorrect inductive bias.

Which of these relations CANNOT be correctly modeled by DistMult?

synonym-of (A is-synonym-of B ↔ B is-synonym-of A) married-to (in datasets where it's bidirectional) parent-of (A parent-of B does NOT mean B parent-of A)

Chapter 1: Complex Embeddings

The ComplEx fix is to let all embeddings live in complex vector space ℂ^d instead of ℝ^d. Each embedding vector is now a vector of complex numbers — numbers of the form a + bi, where a is the real part and b is the imaginary part.

e ∈ ℂ^d, where each component e_i = Re(e_i) + i · Im(e_i)

In practice, we store complex embeddings as two real-valued vectors: one for the real parts and one for the imaginary parts. No special complex-number library is needed. A d-dimensional complex embedding uses the same memory as a 2d-dimensional real embedding.

Why complex numbers? A complex number a + bi encodes both a magnitude and a phase angle (orientation). When you multiply two complex numbers, their magnitudes multiply and their angles add. This phase arithmetic is what allows ComplEx to distinguish (h, r, t) from (t, r, h) — the angles don't commute the same way.

The complex conjugate of z = a + bi is z̄ = a − bi. It flips the sign of the imaginary part. Geometrically, conjugation reflects the number across the real axis. This seemingly minor operation is what gives ComplEx the ability to break symmetry.

Complex Number Geometry

A complex number lives in 2D space. Its conjugate is its reflection. Drag the angle slider to rotate.

Angle 50°

What is the complex conjugate of z = 3 + 4i?

3 − 4 (just remove i) 3 − 4i (negate the imaginary part) −3 + 4i (negate the real part)

Chapter 2: The Hermitian Product

In real-valued DistMult, the score is ⟨h, r, t⟩ = ∑_i h_i r_i t_i. ComplEx extends this to complex space by using the Hermitian inner product — the complex analog of a dot product.

The ComplEx score for a triple (h, r, t) is:

f_r(h, t) = Re(⟨h, r, t̄⟩) = Re( ∑_i=1^d h_i r_i t_i )

Notice that t is conjugated (t̄) but h and r are not. We then take the real part Re(·) of the complex sum to get a real-valued score. This is crucial — we want a scalar score for ranking, not a complex number.

Interactive Hermitian Product

Set complex 2D embeddings for h, r, t. Watch how Re(⟨h, r, t̄⟩) changes — and how it differs from ⟨h, r, t⟩ when we use the conjugate.

h angle (°) 30°

r angle (°) 45°

t angle (°) 60°

Writing it out component-by-component: If h_i = a + bi, r_i = c + di, t_i = e + fi, then t̄_i = e − fi. The product h_i r_i t̄_i is a complex number. Taking Re(·) of the sum gives a real scalar. In implementation: Re(⟨h, r, t̄⟩) = ⟨Re(h), Re(r), Re(t)⟩ + ⟨Im(h), Im(r), Re(t)⟩ + ⟨Re(h), Im(r), Im(t)⟩ − ⟨Im(h), Re(r), Im(t)⟩.

In the ComplEx scoring function Re(⟨h, r, t̄⟩), why do we conjugate t but not h or r?

Conjugating t (but not h) creates an asymmetry between the head and tail positions — this is what allows f_r(h, t) ≠ f_r(t, h) We conjugate t to make the computation real-valued The conjugate of t is always equal to t in practice

Chapter 3: Why Conjugation Breaks Symmetry

Let's prove this carefully. With DistMult: f_r(h, t) = ⟨h, r, t⟩ = ⟨t, r, h⟩ — always symmetric. With ComplEx, swapping h and t gives:

f_r(t, h) = Re(⟨t, r, h̄⟩) = Re( ∑_i t_i r_i h_i )

Compare to the original:

f_r(h, t) = Re( ∑_i h_i r_i t_i )

These are not equal in general. One can be shown to be the complex conjugate of the other: Re(⟨h, r, t̄⟩) = Re(⟨t, r, h̄⟩) only when all imaginary parts are zero (i.e., real-valued embeddings). Once embeddings have nonzero imaginary parts, f_r(h, t) ≠ f_r(t, h).

The magic of phase: Think of each dimension of an embedding as a point on the unit circle in the complex plane, with angle θ. The product h_i · r_i · t̄_i has angle θ_h + θ_r − θ_t. When we swap h and t, the angle becomes θ_t + θ_r − θ_h. These are equal only when θ_h = θ_t. The phase difference θ_h − θ_t is what captures directional asymmetry.

Modeling symmetric vs antisymmetric

What makes ComplEx truly expressive is that it can model both:

Symmetric relations
Set all imaginary parts of r to zero: r ∈ ℝ^d. Then Re(⟨h, r, t̄⟩) = ⟨Re(h), r, Re(t)⟩ + ⟨Im(h), r, Im(t)⟩. Swapping h and t leaves both terms unchanged. Symmetric!

Antisymmetric relations
Use purely imaginary r: Im(r) ≠ 0. The phase rotation makes f_r(h, t) = −f_r(t, h). If (h, r, t) scores high, (t, r, h) scores equally low. Perfectly antisymmetric!

How does ComplEx model a symmetric relation (where f_r(h,t) = f_r(t,h) always)?

By learning relation vectors with zero imaginary parts — when Im(r) = 0, the ComplEx score reduces to a symmetric real-valued dot product By using separate embedding tables for symmetric relations By using a different scoring function for each relation type

Chapter 4: Training

Training ComplEx follows the same recipe as DistMult — negative sampling with a log-likelihood loss — but working over complex embeddings. The key insight is that we never need to do any special complex-number operations at training time: we just store Re and Im parts as separate real-valued tensors.

python
import torch
import torch.nn as nn

class ComplEx(nn.Module):
    def __init__(self, n_entities, n_relations, dim):
        super().__init__()
        # Store real and imaginary parts separately
        self.e_re = nn.Embedding(n_entities, dim)
        self.e_im = nn.Embedding(n_entities, dim)
        self.r_re = nn.Embedding(n_relations, dim)
        self.r_im = nn.Embedding(n_relations, dim)

    def score(self, h_idx, r_idx, t_idx):
        # Fetch embeddings (real and imaginary parts)
        h_re, h_im = self.e_re(h_idx), self.e_im(h_idx)
        r_re, r_im = self.r_re(r_idx), self.r_im(r_idx)
        t_re, t_im = self.e_re(t_idx), self.e_im(t_idx)

        # Re(⟨h, r, t̄⟩) expanded component-wise
        # h*r*conj(t) = (h_re+i*h_im)(r_re+i*r_im)(t_re-i*t_im)
        score = (h_re * r_re * t_re
               + h_im * r_im * t_re
               + h_re * r_im * t_im
               - h_im * r_re * t_im)
        return score.sum(dim=-1)  # sum over embedding dim

    def forward(self, pos_h, pos_r, pos_t, neg_h, neg_r, neg_t):
        pos = self.score(pos_h, pos_r, pos_t)
        neg = self.score(neg_h, neg_r, neg_t)
        loss = -torch.log(torch.sigmoid(pos)).mean()
        loss += -torch.log(torch.sigmoid(-neg)).mean()
        return loss

Key implementation detail: The expansion Re(h·r·t̄) = h_re·r_re·t_re + h_im·r_im·t_re + h_re·r_im·t_im − h_im·r_re·t_im is exactly 4 Hadamard products and 1 sign flip. No complex arithmetic library needed — it's just 4 element-wise multiplications on real tensors. This makes ComplEx nearly as fast as DistMult to train.

Regularization

The authors use N3 regularization (nuclear 3-norm), which penalizes the cube of the magnitude of each embedding:

||e||_N3 = ∑_i (Re(e_i)² + Im(e_i)²)^3/2

This keeps embedding magnitudes controlled and prevents the model from memorizing the training set through large-magnitude embeddings.

How many real-valued Hadamard (element-wise) products does computing one ComplEx score require?

1 (same as DistMult) 4 (h_re·r_re·t_re, h_im·r_im·t_re, h_re·r_im·t_im, h_im·r_re·t_im) 8 (one for each combination of real/imaginary parts)

Chapter 5: Results

Trouillon et al. evaluated ComplEx on WN18 and FB15k (same benchmarks as DistMult) plus WN18RR and FB15k-237 — modified versions that remove inverse relations to prevent models from "cheating" by memorizing relation pairs.

Model	WN18 H@10	FB15k H@10	MRR
TransE	89.2%	47.1%	—
DistMult	94.2%	57.7%	0.830
ComplEx	94.7%	84.0%	0.941

The FB15k jump is striking: DistMult gets 57.7% on FB15k. ComplEx gets 84.0% — a 26-point absolute improvement. The difference is almost entirely from FB15k's many antisymmetric relations (national-team-of, employer-of, country-capital, etc.) that DistMult systematically fails on. WN18 doesn't improve as much because WN18's relations are already more symmetric (hypernym/hyponym are technically antisymmetric but WN18 includes both directions explicitly).

MRR (Mean Reciprocal Rank) is the average of 1/rank across all test queries. If the true answer is ranked first, it contributes 1.0. Second place: 0.5. Third: 0.33. ComplEx's MRR of 0.941 on WN18 means the true answer is nearly always ranked first or second.

Why does ComplEx improve so much more on FB15k (26 points) than on WN18 (only 0.5 points) compared to DistMult?

FB15k has many more antisymmetric relations that DistMult cannot model; WN18's relations are more symmetric, so DistMult already handles them well FB15k has more entities, giving ComplEx more training data ComplEx uses a larger embedding dimension on FB15k

Chapter 6: ComplEx vs DistMult vs TransE

Now we have three models in hand. Understanding their differences helps you choose the right tool for any knowledge graph problem.

Property	TransE	DistMult	ComplEx
Score function	−\|\|h+r−t\|\|	⟨h,r,t⟩	Re(⟨h,r,t̄⟩)
Embedding space	ℝ^d	ℝ^d	ℂ^d
Symmetric relations	Fails	Natural	Handles (Im(r)=0)
Antisymmetric	Handles	Cannot	Handles (Im(r)≠0)
Inversion (r⁻¹)	Handles	Cannot	Handles
Composition	Handles	Cannot	Limited
Params per relation	d	d	2d

ComplEx strictly generalizes DistMult: Set all imaginary parts of entity and relation embeddings to zero. ComplEx's score becomes Re(⟨h, r, t̄⟩) = ⟨Re(h), Re(r), Re(t)⟩ — exactly DistMult. ComplEx can represent any distribution that DistMult can, plus distributions that require asymmetry. It's a strict superset.

When does the extra complexity pay off?

The 2x parameter cost (storing Re and Im) is justified when the knowledge graph has substantial antisymmetric structure. If your graph is mostly symmetric (social networks with mutual connections, synonym graphs), DistMult is probably sufficient. If your graph mixes both — like Freebase — ComplEx is the right default starting point.

ComplEx is said to "strictly generalize" DistMult. What does this mean?

DistMult is a special case of ComplEx where all imaginary parts are zero — every distribution DistMult can model, ComplEx can too, plus more ComplEx always achieves higher Hits@10 than DistMult on every dataset ComplEx uses twice as many embedding dimensions as DistMult

Chapter 7: Connections

ComplEx established complex-valued embeddings as a core tool in knowledge graph representation learning. It directly inspired a line of work exploring richer algebraic structures.

Model	Key Extension	Relationship to ComplEx
RotatE	Relations as complex rotations (unit modulus)	Restricts r to \|r_i\|=1, adds composition
QuatE	Quaternion embeddings (4D complex)	Extends to Hamilton product over ℍ
Analogy	Unifies ComplEx + DistMult + analogy completion	Theoretical unification
Tucker/TuckER	Tucker tensor decomposition	More general bilinear form
DURA	Dual relation regularization	Better regularization for ComplEx

Tensor decomposition perspective: DistMult is a CP decomposition of the KG tensor X[h,r,t]. ComplEx is a CP decomposition of X in complex space — equivalent to a real-valued decomposition with doubled dimensions but with specific structured constraints. The theory of tensor decompositions predicts exactly when each factorization can represent a given relation pattern.

The expressive hierarchy

From least to most expressive for relation patterns:

TransE

Only antisymmetric, 1-to-1 relations

↓ more expressive

DistMult

Only symmetric relations

↓ strictly more expressive

ComplEx

Symmetric + antisymmetric + inversion

↓ strictly more expressive

RotatE

+ composition patterns

Closing thought: ComplEx's key insight — that going to complex space is free in terms of time complexity but gains you the ability to handle asymmetry — is an instance of a broader principle: choosing the right algebraic structure for your data's geometry. Knowledge graphs have asymmetric structure (directionality); complex numbers provide asymmetric algebra (conjugation). Match the math to the geometry of the problem.