Sun, Deng, Nie, Tang (Tsinghua) — ICLR 2019

RotatE: Relations as
Rotations

TransE treats relations as translations — but translations can't handle symmetric, inverse, or composition patterns. RotatE fixes this by working in complex number space and treating each relation as a rotation on the unit circle. One small geometric insight unlocks all four fundamental relation patterns simultaneously.

Prerequisites: TransE + complex numbers

Chapters

Simulations

Chapter 0: Where TransE Fails

TransE embeds knowledge graph facts as translations: h + r ≈ t. For a decade, this was state-of-the-art. But three categories of relation patterns break it:

Symmetric: "is married to" — if A married B, then B married A. Requires r + r = 0, so r = 0.
Inverse: "hypernym" and "hyponym" are inverses — "dog is-a animal" and "animal has-hyponym dog." Requires r₁ = −r₂ in translation space, which forces hypernym and hyponym to be negatives of each other. Fine in 1D, but in high dimensions this interacts poorly with other relations.
Composition: "born in city" + "city in country" = "born in country." Requires r₁ + r₂ = r₃ for arbitrary triples. TransE can satisfy this for some triples but not all — it lacks the algebraic structure to guarantee composition.

All three failures have the same root: translations in Euclidean space don't form a rich enough group. You need an algebraic structure that supports more operations. The answer: rotations in complex space.

The insight: Rotations form a group under composition. The rotation by angle α followed by rotation by β equals rotation by α+β. This means you can naturally represent symmetric (rotate by π), inverse (rotate by −θ), and composition (rotate by θ₁+θ₂) relations — all with the same mathematical object: a unit complex number e^(iθ).

What is the mathematical reason translations fail to represent symmetric relations in TransE?

A symmetric relation requires h+r=t AND t+r=h simultaneously, forcing r=0 — a zero vector that encodes no information Translations cannot be applied twice in the same direction, making bidirectionality impossible The L1 distance function used in TransE is not symmetric, so symmetric relations get wrong gradients

Chapter 1: Complex Space Intuition

A complex number z = a + bi lives in a 2D plane: a is the "real" coordinate, b is the "imaginary" coordinate. You can think of it as a 2D vector (a, b). But complex numbers have one property vectors don't: multiplication rotates and scales simultaneously.

Multiplying z₁ = r₁e^(iθ₁) by z₂ = r₂e^(iθ₂) gives r₁r₂e^(i(θ₁+θ₂)). The magnitudes multiply; the angles add. If we restrict to unit complex numbers (|z|=1), so z = e^(iθ), then multiplication becomes pure rotation: angles add, magnitudes stay 1.

e^iθ × e^iφ = e^i(θ+φ)

|e^iθ| = 1 ∀θ ∈ ℝ

RotatE embeds each entity as a complex vector h ∈ ℂ^d (d complex numbers = 2d real numbers). Each relation is also a complex vector r ∈ ℂ^d, but with the constraint that |r_i| = 1 for all components i. This means each relation component is e^(iθ_i) — a rotation by angle θ_i in the i-th complex plane.

d independent rotation planes: With d-dimensional complex embeddings, you get d independent unit circles, each with its own rotation angle θ_i. The relation rotates each complex plane independently. This gives d degrees of freedom to the relation — much more expressive than a single scalar angle, while still constraining each component to the unit circle.

Complex Number Multiplication = Rotation

All unit complex numbers sit on the unit circle. Multiplying two rotates by their combined angles.

Angle θ (radians) 1.00π

What mathematical property makes complex multiplication suitable for representing relations in RotatE?

Multiplication of unit complex numbers adds their angles — so composing two relations corresponds to adding their rotation angles, giving a natural composition operation Complex numbers have imaginary components that independently encode directed and undirected edges Complex multiplication is not commutative, which models antisymmetric relations automatically

Chapter 2: Rotation as Relation — Showcase

RotatE's core equation: for a true triple (h, r, t), each component of h rotated by r should land at the corresponding component of t:

h ∘ r = t where |r_i| = 1

Here ∘ is element-wise complex multiplication. Since |r_i|=1, each r_i = e^(iθᵢ). So the equation becomes: for each dimension i, h_i · e^(iθᵢ) = t_i. This is a rotation of h_i by angle θ_i.

The scoring function measures how far h∘r is from t, using the distance in the complex plane:

f_r(h, t) = || h ∘ r − t ||

Interactive Rotation Showcase — Unit Circle

The entity h lives on the unit circle (teal point). The relation r rotates it by angle θ (orange arrow). The rotated point h∘r should land at the tail entity t (purple). Adjust θ to complete the triple.

Relation angle θ 0.64π

In RotatE's h ∘ r = t equation, what does the constraint |r_i| = 1 enforce?

Each relation component is a pure rotation (no scaling) — the entity embedding magnitudes are preserved, only their angles change The relation vector lives on the unit hypersphere in real space, matching the entity embedding constraint It forces all entities to have the same norm as the relation, creating a metric space

Chapter 3: Relation Patterns

Here is where RotatE's geometric insight pays off. Every fundamental relation pattern corresponds to a simple constraint on the rotation angle θ:

Symmetry (θ = π): If r_i = e^(iπ) = -1, then h ∘ r = -h and t ∘ r = -t. For a symmetric relation, we need h ∘ r = t AND t ∘ r = h. With θ = π: h · (−1) = t means t = −h, and t · (−1) = h means h = −t. Both hold simultaneously! Rotation by π maps every point to its antipodal — and rotating that point by π again returns to the original.

Symmetric: θ = π → r = e^iπ = −1 → h ˆ r = t ∧ t ˆ r = h ✓

Antisymmetry (θ ≠ 0, π): For most relations, rotating h by θ lands at t but rotating t by θ does NOT land at h (it lands somewhere else). This is automatic for any θ ∉ {0, π}.

Inverse (θ₂ = −θ₁): If relation r₁ has angle θ and its inverse r₂ has angle −θ, then h ∘ r₁ = t implies t ∘ r₂ = t · e^(−iθ) = h · e^(iθ) · e^(−iθ) = h. The inverse relation rotates back. Perfect.

Composition (θ₃ = θ₁ + θ₂): Applying rotation θ₁ then θ₂ is equivalent to rotation θ₁+θ₂. This is exactly the angle-addition property of complex multiplication. Composition relations naturally decompose into angle sums.

All Four Patterns on the Unit Circle

What rotation angle θ allows RotatE to represent symmetric relations?

θ = π — because rotation by π maps every point to its antipodal, and rotating the antipodal by π again returns to the original point θ = 0 — a zero rotation means the entity equals its own symmetric counterpart θ = π/2 — a quarter rotation is self-inverse when applied twice

Chapter 4: Self-Adversarial Negative Sampling

Like TransE, RotatE needs negative samples — corrupted triples to distinguish true facts from false ones. The loss function is:

L = −log σ(γ − f_r(h,t)) − ∑_i=1ⁿ ¹⁄_n log σ(f_r(h_i',t_i') − γ)

The first term: make the true triple score low (well below margin γ). The second term: make each negative triple score high (well above γ). γ is the margin (paper uses γ=9.0 for FB15k-237).

Standard negative sampling picks corrupted triples uniformly at random. But random negatives are often too easy — obvious nonsense like (Einstein, capital-of, Germany). The model learns quickly that these are wrong and stops getting useful gradient signal.

Self-adversarial sampling samples negative triples non-uniformly, weighting by the current model's own score:

p(h_j', r, t_j') = ^{exp(α · f_r(h_j', t_j'))}⁄_{∑_i exp(α · f_r(h_i', t_i'))}

High-scoring negatives (the ones the model currently thinks are plausible) are sampled more frequently. These are the "hard negatives" — the triples that the model almost believes are true, so they provide the most informative gradient signal. α controls the sharpness of the weighting.

Why "self-adversarial"? The model generates its own hard negatives — it's adversarial to itself. This is similar in spirit to curriculum learning or GAN training (where the discriminator learns from the generator's current outputs). The key difference from standard curriculum: no external teacher needed. The current model IS the curriculum.

Hard vs. Easy Negatives

Watch the distribution shift as α increases. Higher α concentrates sampling on the hardest (highest-scoring) negatives.

Temperature α 0.5

What makes a negative triple "hard" in self-adversarial sampling?

The current model assigns it a high plausibility score — the model almost believes it's true, so correcting this belief provides the most informative gradient It involves high-degree entities that appear in many true triples It is a triple where head and tail are the same entity type according to the ontology

Chapter 5: Results

RotatE is evaluated on three standard benchmarks: FB15k, FB15k-237 (a harder subset removing inverse relations to prevent test leakage), and WN18/WN18RR (same issue — WN18 leaks, WN18RR is clean).

Model	FB15k-237 MRR	FB15k-237 H@10	WN18RR MRR	WN18RR H@10
TransE	0.279	44.1%	0.226	50.1%
DistMult	0.241	41.9%	0.430	49.0%
ComplEx	0.247	42.8%	0.440	51.0%
ConvE	0.325	50.1%	0.430	52.0%
RotatE	0.338	53.3%	0.476	57.1%

RotatE sets new state-of-the-art on all four metrics. The improvement is largest on WN18RR: +4.6 MRR and +6.1% H@10 over ComplEx. WN18RR contains many composition relations (hypernym chains) — exactly the pattern where RotatE's angle-composition property shines.

Why FB15k-237 and WN18RR instead of FB15k and WN18? The original FB15k and WN18 benchmarks contain test triples whose answers can be derived by simply inverting training triples. A model that memorizes "if (h, r, t) exists then (t, r⁻¹, h) is the answer" achieves artificially high scores without learning anything meaningful. The "237" and "RR" variants remove these inversions to force genuine generalization.

Why does RotatE perform especially well on WN18RR compared to other benchmarks?

WN18RR contains many composition and hierarchy relations (hypernym chains) — exactly the patterns RotatE models via angle addition, which is a natural algebraic structure for composition WN18RR has fewer entities than FB15k-237, making it easier to memorize WordNet relations are all symmetric, which RotatE handles with θ=π embeddings

Chapter 6: vs TransE / DistMult / ComplEx

RotatE subsumes TransE, DistMult, and ComplEx in expressive power — it can represent everything they can, plus more. Understanding how illuminates the precise role of complex space in KG embedding.

RotatE vs TransE: TransE: h + r = t in ℝ^d. RotatE: h ∘ r = t in ℂ^(d/2). Both use one vector per entity and per relation. But multiplication of unit complex numbers is a much richer operation than addition of real vectors — it has the group structure that enables all four relation patterns.

RotatE vs ComplEx: ComplEx uses: score = Re(Σ h_i · r_i · t̄_i), where t̄ is the complex conjugate of t. This is a bilinear form in complex space. RotatE constrains |r_i|=1, which ComplEx doesn't. The constraint is what forces relations to be pure rotations — ComplEx can scale (grow/shrink embeddings) while RotatE cannot. The constraint costs some expressiveness but gains the geometric interpretation and handles composition cleanly.

Property	TransE	DistMult	ComplEx	RotatE
Symmetric	No (r=0)	Yes	Yes	Yes (θ=π)
Antisymmetric	Yes	No	Yes	Yes
Inverse	Partially	No	Yes	Yes (θ₂=−θ₁)
Composition	Partially	No	No	Yes (θ₃=θ₁+θ₂)
Relation as	Translation	Scaling	Complex scaling	Complex rotation

RotatE is the first model to handle ALL four patterns simultaneously. Composition is the critical gap: ComplEx can handle symmetric/antisymmetric/inverse but cannot naturally compose relations. This is where RotatE's angle-addition property is uniquely powerful.

Which of the four fundamental relation patterns did NO method before RotatE handle cleanly?

Composition — both TransE and ComplEx only handle it partially, while RotatE's angle-addition directly encodes "applying relation r₁ then r₂ equals relation r₃" Symmetry — TransE can handle symmetric relations with r=0, but DistMult and ComplEx cannot Antisymmetry — only RotatE uses complex numbers, which are inherently asymmetric

Chapter 7: Connections

RotatE belongs to a geometric tradition in KG embedding: representing relational semantics as geometric operations. Rotations were the key insight; subsequent work extended to hyperbolic spaces, higher-dimensional rotations, and combinations with attention.

Method	Geometric operation	Space	New capability
TransE (2013)	Translation	ℝ^d	—
RotatE (2019)	Rotation (unit \|r\|)	ℂ^(d/2)	All 4 patterns
QuatE (2019)	Quaternion rotation	ℍ^(d/4)	3D rotations
HAKE (2020)	Translation + rotation	ℝ×ℂ	Hierarchies
PairRE (2021)	Paired rotation	ℝ^d × ℝ^d	Complex patterns

The broader lesson: RotatE demonstrates a powerful methodology — identify the algebraic properties your task requires (symmetry, composition) and then choose or design a mathematical space whose operations naturally encode those properties. The complex unit circle was chosen because multiplication on it is exactly the operation needed. The lesson generalizes: match your mathematical structure to your data's semantics.

Related lessons

TransE — The predecessor
GraphSAGE — GNN-based embeddings
LightGCN — GCN for recommendation

Key papers

Sun et al., ICLR 2019 (RotatE)
Bordes et al., NeurIPS 2013 (TransE)
Trouillon et al., ICML 2016 (ComplEx)
Zhang et al., NeurIPS 2019 (QuatE)

"We model each relation as a rotation from the source entity to the target entity in the complex vector space."
— Sun et al. (2019)

RotatE: Relations asRotations