Sun, Deng, Nie, Tang (Tsinghua) — ICLR 2019

RotatE: Relations as
Rotations

TransE treats relations as translations — but translations can't handle symmetric, inverse, or composition patterns. RotatE fixes this by working in complex number space and treating each relation as a rotation on the unit circle. One small geometric insight unlocks all four fundamental relation patterns simultaneously.

Prerequisites: TransE + complex numbers
8
Chapters
4+
Simulations

Chapter 0: Where TransE Fails

TransE embeds knowledge graph facts as translations: h + r ≈ t. For a decade, this was state-of-the-art. But three categories of relation patterns break it:

All three failures have the same root: translations in Euclidean space don't form a rich enough group. You need an algebraic structure that supports more operations. The answer: rotations in complex space.

The insight: Rotations form a group under composition. The rotation by angle α followed by rotation by β equals rotation by α+β. This means you can naturally represent symmetric (rotate by π), inverse (rotate by −θ), and composition (rotate by θ₁+θ₂) relations — all with the same mathematical object: a unit complex number e^(iθ).
What is the mathematical reason translations fail to represent symmetric relations in TransE?

Chapter 1: Complex Space Intuition

A complex number z = a + bi lives in a 2D plane: a is the "real" coordinate, b is the "imaginary" coordinate. You can think of it as a 2D vector (a, b). But complex numbers have one property vectors don't: multiplication rotates and scales simultaneously.

Multiplying z₁ = r₁e^(iθ₁) by z₂ = r₂e^(iθ₂) gives r₁r₂e^(i(θ₁+θ₂)). The magnitudes multiply; the angles add. If we restrict to unit complex numbers (|z|=1), so z = e^(iθ), then multiplication becomes pure rotation: angles add, magnitudes stay 1.

e × e = ei(θ+φ)
|e| = 1   ∀θ ∈ ℝ

RotatE embeds each entity as a complex vector h ∈ ℂ^d (d complex numbers = 2d real numbers). Each relation is also a complex vector r ∈ ℂ^d, but with the constraint that |r_i| = 1 for all components i. This means each relation component is e^(iθ_i) — a rotation by angle θ_i in the i-th complex plane.

d independent rotation planes: With d-dimensional complex embeddings, you get d independent unit circles, each with its own rotation angle θ_i. The relation rotates each complex plane independently. This gives d degrees of freedom to the relation — much more expressive than a single scalar angle, while still constraining each component to the unit circle.
Complex Number Multiplication = Rotation

All unit complex numbers sit on the unit circle. Multiplying two rotates by their combined angles.

Angle θ (radians) 1.00π
What mathematical property makes complex multiplication suitable for representing relations in RotatE?

Chapter 2: Rotation as Relation — Showcase

RotatE's core equation: for a true triple (h, r, t), each component of h rotated by r should land at the corresponding component of t:

h ∘ r = t    where    |ri| = 1

Here ∘ is element-wise complex multiplication. Since |r_i|=1, each r_i = e^(iθᵢ). So the equation becomes: for each dimension i, h_i · e^(iθᵢ) = t_i. This is a rotation of h_i by angle θ_i.

The scoring function measures how far h∘r is from t, using the distance in the complex plane:

fr(h, t) = || h ∘ r − t ||
Interactive Rotation Showcase — Unit Circle

The entity h lives on the unit circle (teal point). The relation r rotates it by angle θ (orange arrow). The rotated point h∘r should land at the tail entity t (purple). Adjust θ to complete the triple.

Relation angle θ 0.64π
In RotatE's h ∘ r = t equation, what does the constraint |r_i| = 1 enforce?

Chapter 3: Relation Patterns

Here is where RotatE's geometric insight pays off. Every fundamental relation pattern corresponds to a simple constraint on the rotation angle θ:

Symmetry (θ = π): If r_i = e^(iπ) = -1, then h ∘ r = -h and t ∘ r = -t. For a symmetric relation, we need h ∘ r = t AND t ∘ r = h. With θ = π: h · (−1) = t means t = −h, and t · (−1) = h means h = −t. Both hold simultaneously! Rotation by π maps every point to its antipodal — and rotating that point by π again returns to the original.

Symmetric: θ = π  →  r = e = −1  →  h ˆ r = t ∧ t ˆ r = h   ✓

Antisymmetry (θ ≠ 0, π): For most relations, rotating h by θ lands at t but rotating t by θ does NOT land at h (it lands somewhere else). This is automatic for any θ ∉ {0, π}.

Inverse (θ₂ = −θ₁): If relation r₁ has angle θ and its inverse r₂ has angle −θ, then h ∘ r₁ = t implies t ∘ r₂ = t · e^(−iθ) = h · e^(iθ) · e^(−iθ) = h. The inverse relation rotates back. Perfect.

Composition (θ₃ = θ₁ + θ₂): Applying rotation θ₁ then θ₂ is equivalent to rotation θ₁+θ₂. This is exactly the angle-addition property of complex multiplication. Composition relations naturally decompose into angle sums.

All Four Patterns on the Unit Circle
What rotation angle θ allows RotatE to represent symmetric relations?

Chapter 4: Self-Adversarial Negative Sampling

Like TransE, RotatE needs negative samples — corrupted triples to distinguish true facts from false ones. The loss function is:

L = −log σ(γ − fr(h,t)) − ∑i=1n 1n log σ(fr(hi',ti') − γ)

The first term: make the true triple score low (well below margin γ). The second term: make each negative triple score high (well above γ). γ is the margin (paper uses γ=9.0 for FB15k-237).

Standard negative sampling picks corrupted triples uniformly at random. But random negatives are often too easy — obvious nonsense like (Einstein, capital-of, Germany). The model learns quickly that these are wrong and stops getting useful gradient signal.

Self-adversarial sampling samples negative triples non-uniformly, weighting by the current model's own score:

p(hj', r, tj') = exp(α · fr(hj', tj'))i exp(α · fr(hi', ti'))

High-scoring negatives (the ones the model currently thinks are plausible) are sampled more frequently. These are the "hard negatives" — the triples that the model almost believes are true, so they provide the most informative gradient signal. α controls the sharpness of the weighting.

Why "self-adversarial"? The model generates its own hard negatives — it's adversarial to itself. This is similar in spirit to curriculum learning or GAN training (where the discriminator learns from the generator's current outputs). The key difference from standard curriculum: no external teacher needed. The current model IS the curriculum.
Hard vs. Easy Negatives

Watch the distribution shift as α increases. Higher α concentrates sampling on the hardest (highest-scoring) negatives.

Temperature α 0.5
What makes a negative triple "hard" in self-adversarial sampling?

Chapter 5: Results

RotatE is evaluated on three standard benchmarks: FB15k, FB15k-237 (a harder subset removing inverse relations to prevent test leakage), and WN18/WN18RR (same issue — WN18 leaks, WN18RR is clean).

ModelFB15k-237 MRRFB15k-237 H@10WN18RR MRRWN18RR H@10
TransE0.27944.1%0.22650.1%
DistMult0.24141.9%0.43049.0%
ComplEx0.24742.8%0.44051.0%
ConvE0.32550.1%0.43052.0%
RotatE0.33853.3%0.47657.1%

RotatE sets new state-of-the-art on all four metrics. The improvement is largest on WN18RR: +4.6 MRR and +6.1% H@10 over ComplEx. WN18RR contains many composition relations (hypernym chains) — exactly the pattern where RotatE's angle-composition property shines.

Why FB15k-237 and WN18RR instead of FB15k and WN18? The original FB15k and WN18 benchmarks contain test triples whose answers can be derived by simply inverting training triples. A model that memorizes "if (h, r, t) exists then (t, r⁻¹, h) is the answer" achieves artificially high scores without learning anything meaningful. The "237" and "RR" variants remove these inversions to force genuine generalization.
Why does RotatE perform especially well on WN18RR compared to other benchmarks?

Chapter 6: vs TransE / DistMult / ComplEx

RotatE subsumes TransE, DistMult, and ComplEx in expressive power — it can represent everything they can, plus more. Understanding how illuminates the precise role of complex space in KG embedding.

RotatE vs TransE: TransE: h + r = t in ℝ^d. RotatE: h ∘ r = t in ℂ^(d/2). Both use one vector per entity and per relation. But multiplication of unit complex numbers is a much richer operation than addition of real vectors — it has the group structure that enables all four relation patterns.

RotatE vs ComplEx: ComplEx uses: score = Re(Σ h_i · r_i · t̄_i), where t̄ is the complex conjugate of t. This is a bilinear form in complex space. RotatE constrains |r_i|=1, which ComplEx doesn't. The constraint is what forces relations to be pure rotations — ComplEx can scale (grow/shrink embeddings) while RotatE cannot. The constraint costs some expressiveness but gains the geometric interpretation and handles composition cleanly.

PropertyTransEDistMultComplExRotatE
SymmetricNo (r=0)YesYesYes (θ=π)
AntisymmetricYesNoYesYes
InversePartiallyNoYesYes (θ₂=−θ₁)
CompositionPartiallyNoNoYes (θ₃=θ₁+θ₂)
Relation asTranslationScalingComplex scalingComplex rotation

RotatE is the first model to handle ALL four patterns simultaneously. Composition is the critical gap: ComplEx can handle symmetric/antisymmetric/inverse but cannot naturally compose relations. This is where RotatE's angle-addition property is uniquely powerful.

Which of the four fundamental relation patterns did NO method before RotatE handle cleanly?

Chapter 7: Connections

RotatE belongs to a geometric tradition in KG embedding: representing relational semantics as geometric operations. Rotations were the key insight; subsequent work extended to hyperbolic spaces, higher-dimensional rotations, and combinations with attention.

MethodGeometric operationSpaceNew capability
TransE (2013)Translationℝ^d
RotatE (2019)Rotation (unit |r|)ℂ^(d/2)All 4 patterns
QuatE (2019)Quaternion rotationℍ^(d/4)3D rotations
HAKE (2020)Translation + rotationℝ×ℂHierarchies
PairRE (2021)Paired rotationℝ^d × ℝ^dComplex patterns
The broader lesson: RotatE demonstrates a powerful methodology — identify the algebraic properties your task requires (symmetry, composition) and then choose or design a mathematical space whose operations naturally encode those properties. The complex unit circle was chosen because multiplication on it is exactly the operation needed. The lesson generalizes: match your mathematical structure to your data's semantics.

Related lessons

Key papers

  • Sun et al., ICLR 2019 (RotatE)
  • Bordes et al., NeurIPS 2013 (TransE)
  • Trouillon et al., ICML 2016 (ComplEx)
  • Zhang et al., NeurIPS 2019 (QuatE)
"We model each relation as a rotation from the source entity to the target entity in the complex vector space."
— Sun et al. (2019)