CS224W Lecture 10 — Knowledge Graphs

Chapter 0: What Are Knowledge Graphs?

J.K. Rowling wrote Harry Potter. Harry Potter belongs to the Fantasy genre. Fantasy is a subtype of Fiction. These are facts. And facts have a shape: a thing, connected to another thing, by a relationship.

A knowledge graph (KG) is a database that stores facts exactly this way — as triples: (head entity, relation, tail entity), or (h, r, t) for short. Every fact in the world can be squeezed into this form.

(J.K. Rowling, genre, Fantasy)

(Harry Potter, author, J.K. Rowling)

(Penicillin, treats, Pneumonia)

The entities — people, places, diseases, proteins, concepts — are the nodes. The relations — genre, author, treats, birthplace, subclass_of — are the edge types. Together they form a directed, multi-relational graph.

Real Knowledge Graphs Are Enormous

This isn't a toy idea. Real KGs are used at Google, Microsoft, and Amazon to power search, recommendations, and question-answering:

KG	Entities	Relations	Triples	Used For
Freebase	80M	6,000+	3B	Google Search (deprecated)
Wikidata	100M+	10,000+	1.3B	Wikipedia, general knowledge
Google KG	500M+	—	3.5B	Search, Assistant
YAGO	10M	100+	120M	Research, NLP
Hetionet	47K	24	2.25M	Drug discovery, biology

The Big Problem: Massive Incompleteness

Here's the uncomfortable truth about every real KG: they are riddled with holes. Freebase, one of the largest KGs ever built, was studied carefully and the findings were stark:

93.8% of persons in Freebase have no recorded birthplace. 78.5% have no recorded nationality. 75% have no known parents. The KG is not a complete encyclopedia — it's a sparse sample of facts that happened to get entered.

This incompleteness isn't a bug in the data entry process. It's structural. The world has more facts than any team of humans can annotate. New entities are created every day. Relations are asymmetric — if A treats B is known, B is treatable by A might not be entered separately.

The Task: KG Completion

Knowledge graph completion (also called link prediction) is the problem of predicting missing triples from the ones you have. Given a partial KG, can a model score unseen (h, r, t) triples as "likely true" or "likely false"?

Knowledge Graph Explorer

A small KG about authors and books. Dashed edges are missing — facts that aren't in the graph yet. Can you guess what they should be? Click a node to highlight its known facts.

Click any node to see its known relations.

Freebase has 80 million entities but 93.8% of persons lack a recorded birthplace. What does this tell us about knowledge graphs?

The data was entered carelessly and should be corrected manually Birthplace is not a useful relation to store in a KG KGs are fundamentally incomplete — they sample real-world facts, so predicting missing triples is a critical task

Chapter 1: The KG Embedding Framework

How do you teach a computer to predict whether (Einstein, birthplace, Ulm) is a true fact? You can't do fuzzy string matching. You need to capture the meaning of entities and relations in a form a machine can reason about.

The answer: embeddings. Assign every entity and every relation a vector in some k-dimensional space. Then define a scoring function f_r(h, t) that takes the head and tail entity vectors plus the relation vector, and outputs a scalar score — high if the triple is likely true, low otherwise.

f_r(h, t) ∈ ℝ → higher = more likely true

Different embedding models differ in exactly one thing: the form of f_r. Everything else — the training procedure, the evaluation metrics, the intuition — is shared.

The key insight: We're not learning rules like "X birthplace Y if Y is a city and X was born there." We're learning geometry. Entities and relations are points and transformations in vector space, and truth is proximity or alignment in that space.

What Goes In and What Comes Out

Entities: h, t ∈ ℝ^k

One vector per entity, shared across all relations. Learned during training.

↓

Relation: r ∈ ℝ^k (or ℂ^k)

One vector (or matrix, or rotation) per relation type. Learned during training.

↓

Scoring Function f_r(h, t)

Combines h, r, t into a scalar. Different models = different f_r.

↓

Prediction

High score = likely true triple. Low score = likely false triple.

The Embedding Dimension k Is a Hyperparameter

Typical values: k = 50 to k = 1000. Larger k → more expressive, more parameters, more compute. For Freebase with 80M entities, even k = 200 gives 80M × 200 = 16 billion parameters for entity embeddings alone. This is why KG completion models are "shallow" — they don't stack layers.

Training Signal: Contrastive Learning

True triples from the KG should score high. But there are infinitely many false triples — we can't enumerate them all. Instead, we corrupt a true triple by replacing the head or tail with a random entity to create a negative sample:

(h, r, t) → true (h', r, t) or (h, r, t') → corrupted (false)

The loss pushes true scores above corrupted scores. The margin, which model, and how many corruptions per true triple are all design choices that affect final performance significantly.

All KG embedding models share the same training procedure but differ in one thing. What is it?

The dimensionality k of the embedding space Whether to use GPU or CPU for training The scoring function f_r(h, t) — how h, r, t are combined to produce a score

Chapter 2: TransE — Translation in Embedding Space

TransE (Bordes et al., NeurIPS 2013) is the simplest and most elegant KG embedding model. It has one geometric intuition so clean you can draw it on a napkin: a relation is a translation vector.

If (h, r, t) is a true triple, then the tail entity should be located at the head entity plus the relation vector. In embedding space:

h + r ≈ t

Think of it like directions on a map. "Paris" + "is_capital_of" ≈ "France". "Einstein" + "birthplace" ≈ "Ulm". "Penicillin" + "treats" ≈ "Pneumonia". The relation arrow, when added to the head, points at the tail.

Scoring function: f_r(h, t) = −||h + r − t||₂. High (less negative) = the triple is close to true. Low (very negative) = far apart.

Why the Minus Sign?

Distance is always non-negative, so ||h + r − t|| = 0 means perfect alignment. We negate it so that the scoring function is higher for truer triples (||distance|| → 0 → score → 0), consistent with every other model where higher = more likely true.

What TransE Handles Well

Antisymmetric relations: If h + r ≈ t, then t + r ≠ h (unless r = 0). So "is_parent_of" learned as a vector naturally captures that parenthood is one-directional. Good.

Composition: If h + r₁ ≈ m and m + r₂ ≈ t, then h + (r₁ + r₂) ≈ t. Multi-hop facts compose naturally by adding relation vectors. Good.

What TransE Cannot Handle

Symmetric relations: "is_married_to" should satisfy f_r(h, t) = f_r(t, h). But if h + r ≈ t, then t + r ≈ h would require r = 0. So TransE would learn near-zero vectors for symmetric relations, which makes them uninformative. Bad.

1-to-N relations: "has_nationality" maps one country to many people. TransE would try to put all those people at the same point (France + has_nationality ≈ everyone born in France), forcing them to cluster together even if they're very different entities. Bad.

TransE: Watch the Translation

Entities are points in 2D space. Relations are arrows (translation vectors). True triple: the orange arrow from h points at t. Drag the head entity or adjust the relation vector to see how the score changes.

r_x 80

r_y -60

Score: f_r(h,t) = -||h + r - t||

For a symmetric relation like "is_married_to", TransE requires f_r(h,t) = f_r(t,h). Why can't TransE satisfy this?

TransE only works in 2D, but symmetric relations require higher dimensions h + r ≈ t and t + r ≈ h simultaneously requires r = 0, making the relation vector uninformative TransE uses L2 distance which is always asymmetric

Chapter 3: TransR — Relation-Specific Projection

TransE's limitation with 1-to-N relations comes from a single cause: it forces every entity to live in the same space with the same metric. "France" must be simultaneously near "Paris" (via is_capital_of), near "Camus" (via nationality_of), and near "Brie" (via produced_in). It can't serve all three roles in one point.

TransR (Lin et al., AAAI 2015) fixes this with one idea: project entities into a relation-specific subspace before measuring distance. Each relation r gets not just a translation vector, but also a projection matrix M_r.

M_r h + r ≈ M_r t where M_r ∈ ℝ^k×d

Here d is the entity embedding dimension and k is the relation space dimension. The matrix M_r projects the entity from d-dimensional entity space into k-dimensional relation space. Then the translation h + r ≈ t happens in that relation-specific k-dimensional space.

Analogy: Different relations see entities from different "perspectives." The relation "nationality_of" cares about where an entity was created or born — it projects France's embedding onto a geography-relevant subspace. The relation "authored_by" cares about the creative lineage — it projects onto a different subspace. France and Paris might be nearby in the nationality subspace but far apart in the authorship subspace.

How This Fixes 1-to-N

For a 1-to-N relation like "capital_of" (many capitals, one country each), entities that all map to France don't need to cluster in entity space. They just need to land near France after projection through M_r. M_r can map many different entity vectors to the same region in relation space.

f_r(h, t) = −||M_rh + r − M_rt||₂

The Cost: More Parameters

Each relation now has a d × k projection matrix plus a k-dimensional translation vector. For a KG with 6,000 relation types (like Freebase), that's 6,000 × (d × k + k) parameters just for relations. If d = k = 200, that's 6,000 × 40,200 = ~241M parameters for relations alone. Compute-intensive.

TransE

Params per relation: k (translation vector)
Symmetric: ✗
1-to-N: ✗
Fast, simple

TransR

Params per relation: k×d + k (matrix + vector)
Symmetric: ✗ (still translation-based)
1-to-N: ✓
Expensive for many relation types

Still can't do symmetric relations. TransR inherits TransE's core mechanism — translation in relation space — so it faces the same r = 0 problem for symmetric relations. The projection helps with 1-to-N but not with symmetry.

TransR uses a projection matrix M_r per relation. What specific limitation of TransE does this fix?

It fixes symmetric relations by making the score symmetric It reduces the total number of parameters needed for training It fixes 1-to-N relations by projecting entities into relation-specific subspaces, so many entities can map to one region without clustering in entity space

Chapter 4: DistMult — Bilinear Scoring

The TransE family uses geometric translation. Let's try something completely different: a multiplicative scoring function. Instead of asking "is h + r close to t?", ask "how well do h, r, and t align when you multiply their components?

DistMult (Yang et al., ICLR 2015) defines the scoring function as an element-wise trilinear product:

f_r(h, t) = ⟨h, r, t⟩ = ∑_i h_i · r_i · t_i

Each dimension i contributes h_i × r_i × t_i to the score. The relation vector r_i acts as a per-dimension weight that says how much dimension i matters for this relation. If r_i is large and positive, the model wants h_i and t_i to have the same sign. If r_i is large and negative, it wants them to have opposite signs.

Analogy: a weighting mixer. Imagine h and t are mixing board settings — one vector per channel. The relation r tells you which channels matter. The score is the sum of what's playing in each active channel, weighted by both entities' settings in that channel.

The Symmetry Problem

Notice: ∑ h_i · r_i · t_i = ∑ t_i · r_i · h_i. Multiplication is commutative. So f_r(h, t) = f_r(t, h) always. DistMult is symmetric by construction.

This is a hard constraint, not a soft bias. No matter how you train DistMult, it will score (h, r, t) the same as (t, r, h) for every triple. For a relation like "is_married_to" — where the score should be symmetric — this is exactly right. For "is_parent_of" — which is antisymmetric — DistMult literally cannot represent it correctly.

What DistMult Can and Cannot Do

DistMult strengths

Symmetric relations: ✓ (natural)
1-to-N relations: ✓ (bilinear is expressive enough)
Fast: no matrix multiplication, just elementwise
Scales to large KGs easily

DistMult limitations

Antisymmetric relations: ✗ (structurally impossible)
Inverse relations: ✗ (same issue as antisymmetry)
Composition: ✗ (bilinear doesn't compose)
Only captures diagonal of full bilinear form

The "only diagonal" point is worth unpacking. A full bilinear model would use a matrix W_r per relation and score f_r(h, t) = h^T W_r t. DistMult restricts W_r to be diagonal — this is the "Dist" (diagonalized) in "DistMult." It saves parameters (k instead of k²) but loses the ability to model interactions between different dimensions of h and t.

DistMult scores f_r(h,t) = Σ h_i r_i t_i. Why can it NOT model the relation "is_parent_of"?

The summation is too slow to compute for large KGs Multiplication is commutative, so f_r(h,t) = f_r(t,h) always — it can't distinguish who is the parent and who is the child "is_parent_of" requires a 3D embedding space to represent

Chapter 5: ComplEx — Complex-Valued Bilinear

DistMult is symmetric because real multiplication commutes. But what if we embedded in complex space? Complex multiplication doesn't commute — and that's exactly what we need to handle both symmetric and antisymmetric relations.

ComplEx (Trouillon et al., ICML 2016) embeds entities and relations in ℂ^k — vectors of complex numbers. Each component h_i = a_i + b_ii, where a is the real part and b is the imaginary part.

The scoring function uses the Hermitian inner product:

f_r(h, t) = Re(⟨h, r, t̄⟩) = Re(∑_i h_i · r_i · t̄_i)

That overbar on t̄ denotes the complex conjugate: if t_i = a + bi, then t̄_i = a − bi. We take the real part Re(·) of the sum because scores should be real numbers.

Why Conjugation Breaks Symmetry

Let's compare f_r(h, t) and f_r(t, h) in ComplEx:

f_r(h, t) = Re(∑_i h_i r_i t̄_i)

f_r(t, h) = Re(∑_i t_i r_i h̄_i)

These are NOT the same in general. The conjugate is applied to t in the first case, and to h in the second. Only when all imaginary parts of h and t are zero do they coincide — that's real DistMult as a special case.

The geometric picture: Complex multiplication rotates and scales. When you conjugate t, you flip its angle in the complex plane. So measuring alignment between h, r, and t̄ is different from measuring alignment between t, r, and h̄. The asymmetry is baked into the conjugation.

Handling Both Symmetric and Antisymmetric Relations

For symmetric relations (e.g., "is_married_to"): The model learns relation vectors r with zero imaginary part. Then conjugation has no effect, and f_r(h, t) = f_r(t, h). ComplEx reduces to DistMult for these.

For antisymmetric relations (e.g., "is_parent_of"): The model learns relation vectors r with nonzero imaginary parts. The conjugation now makes f_r(h, t) ≠ f_r(t, h). The model can score (parent, r, child) high and (child, r, parent) low simultaneously.

ComplEx: Complex Space Showcase

Entities h (orange) and t (teal) as complex numbers. The relation r rotates/scales. Conjugation flips t across the real axis. Watch how f_r(h,t) ≠ f_r(t,h) for asymmetric r. Toggle the relation type to see the difference.

h angle (θ_h) 30°

t angle (θ_t) 120°

Scores: f(h,t) = — f(t,h) = —

ComplEx scores f_r(h,t) = Re(Σ h_i r_i t̄_i). What makes this NOT always equal to f_r(t,h)?

The Re(·) operation makes the scores asymmetric ComplEx uses matrix multiplication instead of elementwise products The complex conjugate t̄ is applied to t in f(h,t) but to h in f(t,h) — they differ by their imaginary parts, breaking symmetry

Chapter 6: RotatE — Rotation as Relation

ComplEx handles symmetry and antisymmetry. But can one model handle everything? RotatE (Sun et al., ICLR 2019) comes close by making relations even more geometric: each relation is a rotation in complex space.

Embed every entity in ℂ^k. Represent each relation r as a vector of unit complex numbers — complex numbers on the unit circle, each parameterized by an angle θ_i:

r_i = e^iθ_i = cos(θ_i) + i sin(θ_i), |r_i| = 1

A true triple (h, r, t) should satisfy:

h ∘ r = t

Where ∘ is element-wise complex multiplication. Since |r_i| = 1, multiplying by r_i rotates h_i by angle θ_i without changing its magnitude. Each dimension gets independently rotated by its own angle.

f_r(h, t) = −||h ∘ r − t||

Why Rotation Covers All Relation Patterns

Each pattern corresponds to a specific constraint on the rotation angles θ:

Symmetric relations: r ∘ r = 1, which means θ_i = π (rotating twice brings you back). If h ∘ r = t then t ∘ r = h ∘ r ∘ r = h. ✓
Antisymmetric relations: θ_i ≠ 0 and θ_i ≠ π. Rotating by a non-trivial angle gives a non-trivial different position for (t, r) than (h, r). ✓
Inverse relations: If r' = r⁻¹ (negate the angles, i.e., θ'_i = −θ_i), then h ∘ r = t implies t ∘ r' = h. Inverses come for free. ✓
Composition: If r₁ rotates by θ₁ and r₂ by θ₂, then r₁ ∘ r₂ rotates by θ₁ + θ₂. So multi-hop relations compose by adding angles. ✓

RotatE is the most expressive of the four models. It handles symmetric, antisymmetric, inverse, and composition — four distinct relation patterns. The unit-circle constraint on r is not a limitation but the source of its power: rotations compose perfectly.

RotatE: Rotation in Complex Space

Each dot is a complex number (angle = position on circle, radius = magnitude). The relation rotates h by angle θ. True triple: h ∘ r should land on t. Adjust θ to see how rotation predicts the tail.

θ (rotation) 90°

h angle 30°

Score: f_r(h,t) = -||h∘r - t||

In RotatE, a symmetric relation like "is_married_to" should satisfy h∘r = t AND t∘r = h. What rotation angle θ achieves this?

θ = 0 (no rotation) θ = 90° (quarter turn) θ = 180° (half turn), since rotating twice by 180° returns to the start

Chapter 7: Relation Patterns — The Full Comparison

We've seen four models. Each has a different geometric intuition and a different set of relation patterns it can express. A relation pattern is a structural property that holds across the triples of a relation — regardless of which specific entities are involved.

The Five Patterns

Symmetric: r(h,t) ⟹ r(t,h)
Examples: is_married_to, is_similar_to, co-occurs_with

Antisymmetric: r(h,t) ⟹ ¬r(t,h)
Examples: is_parent_of, is_larger_than, hypernym_of

Inverse: r₁(h,t) ⟺ r₂(t,h)
Examples: parent_of ↔ child_of, treats ↔ treated_by

Composition: r₁(h,m) ∧ r₂(m,t) ⟹ r₃(h,t)
Examples: parent_of ∘ parent_of = grandparent_of

1-to-N: r(h,t₁) ∧ r(h,t₂) with t₁ ≠ t₂
Examples: has_nationality (one person, one country), has_genre (movie, many genres)

Which Model Handles Which Pattern?

Pattern	TransE	TransR	DistMult	ComplEx	RotatE
Symmetric	✗	✗	✓	✓	✓
Antisymmetric	✓	✓	✗	✓	✓
Inverse	✓	✓	✗	✓	✓
Composition	✓	✗	✗	✗	✓
1-to-N	✗	✓	✓	✓	✓

RotatE dominates: it's the only model that handles all five patterns. But this isn't pure win — it comes with the overhead of complex arithmetic and careful initialization of angles.

Why DistMult and ComplEx fail on composition: Adding bilinear scores doesn't compose in the way rotation angles do. If f_r₁(h,m) = Σ h_i r₁_i m̄_i is high and f_r₂(m,t) is high, there's no algebraic way to derive that f_r₃(h,t) should be high for r₃ = r₁ ∘ r₂. The bilinear form doesn't chain.

Practical Guidance

In practice, performance depends on your KG's dominant relation types. If most relations are symmetric (social networks, similarity graphs), DistMult or ComplEx win on speed. If composition is critical (hierarchical taxonomies, multi-hop reasoning), RotatE or TransE are better choices.

The relation "grandparent_of" can be expressed as "parent_of ∘ parent_of" (composition). Which models can represent this pattern?

DistMult and ComplEx, because they use bilinear scoring TransR, because its projection matrices can capture composition TransE and RotatE — TransE because translation vectors add, RotatE because rotation angles add

Chapter 8: Training KG Embeddings

We know what to optimize (the scoring function), and we know the signal (true triples should score higher). But KGs only contain true facts — there are no labeled false triples. We have to create them.

Negative Sampling

For every true triple (h, r, t) in the training set, we generate negative triples by corrupting either the head or the tail:

(h', r, t) where h' ∼ Uniform(E) or (h, r, t') where t' ∼ Uniform(E)

This assumes the KG is a closed world for training — any triple not in the KG is treated as false. This is an approximation: some corrupted triples might actually be true but just not recorded. In practice, this rarely causes major problems because the KG is so sparse.

Loss Functions

Margin loss (used by TransE):

L = ∑_(h,r,t)∈KG ∑_(h',r,t') max(0, γ + f_r(h',t') − f_r(h,t))

Here γ is the margin — we want true triples to score at least γ higher than corrupted ones. If they already do, the loss is zero. If not, we push in the right direction.

Cross-entropy loss (used by DistMult, ComplEx, RotatE): Apply softmax over all possible tails for a given (h, r), then take cross-entropy with the true tail. This is more expensive (sum over all entities) but often trains faster to convergence.

Self-adversarial negative sampling (RotatE's trick): Sample negative triples with probability proportional to the current model's score. High-scoring negatives are the "hard negatives" — the ones the model almost thinks are true. Training on hard negatives gives stronger gradient signal.

p(h'_j, r, t'_j) = softmax(α · f_r(h'_j, t'_j))

The temperature α controls how aggressive the hard-negative mining is. High α → very focused on the hardest negatives. Low α → nearly uniform sampling like the basic approach.

Evaluation: Ranking Metrics

Given a test triple (h, r, t), replace t with every entity in E and score all of them. Rank t among all these candidates. Metrics:

Metric	Formula	Meaning
MRR	Mean(1/rank)	Mean reciprocal rank — 1.0 is perfect, higher is better
Hits@1	% rank=1	Fraction of test triples ranked #1
Hits@10	% rank≤10	Fraction of test triples in top 10

Typical results on FB15k-237 (a Freebase subset): RotatE achieves MRR ~0.338, Hits@10 ~0.533. TransE achieves MRR ~0.279, Hits@10 ~0.465. The gap is real but not enormous — dataset and hyperparameter choice matter a lot.

Training Dynamics Showcase

Watch a mini KG get trained with margin loss. Each step: pick a true triple, generate a negative, compute the loss, update embeddings. The score gap between true and false triples should grow over time.

Learning Rate 0.05

Press Start to begin training.

Why do KG embedding models use "negative sampling" during training?

To reduce memory usage by only loading part of the KG at once To speed up the softmax computation over all entities KGs only contain true facts — we must synthetically create false triples by corrupting true ones to give the model a contrastive training signal

Chapter 9: Connections and What's Next

We've built the full picture of shallow KG embeddings. Before moving on, let's situate this in the wider landscape — what these models can't do, and what comes next.

The Shallow Ceiling

Every model in this lecture — TransE, TransR, DistMult, ComplEx, RotatE — is a shallow embedding model. Entities have one embedding vector, and that vector doesn't change based on context. The embedding of "Paris" is the same whether you're asking about its geography, its cultural institutions, or its historical events.

The fundamental limitation: Shallow embeddings cannot perform multi-hop reasoning that requires integrating evidence across the graph. "Who are the siblings of Einstein's students?" requires traversing: Einstein → students → each student's parents → each parent's other children. A lookup of a fixed vector can't chain these steps.

Combining GNNs with KG Embeddings

The next step — covered in later CS224W lectures — is to replace the static entity embeddings with GNN-computed representations. Instead of a fixed vector for "Paris," a GNN aggregates information from Paris's neighbors (Eiffel Tower, Seine River, France, ...) and computes a context-aware embedding. Then you apply the same scoring functions (TransE, RotatE) on top.

This combination gives you the expressiveness of relational geometry AND the contextual flexibility of graph neural networks.

Applications Where KGs Matter

Search and QA

Google's Knowledge Panel ("Albert Einstein — German-born physicist, 1879-1955") is powered by a KG. KG embeddings enable fuzzy lookups: "Who are physicists born in Germany?" becomes a nearest-neighbor query in embedding space.

Drug Discovery

Hetionet KG connects Genes, Diseases, Compounds, Pathways. KG completion predicts: "Compound X treats Disease Y" — novel drug-disease links. Several COVID-19 drug repurposing studies used exactly this approach.

Recommendation

E-commerce KGs: (User, purchases, Product), (Product, belongs_to, Category), (User, searches, Query). KG completion predicts (User, will_buy, Product). Richer than pure collaborative filtering because it explains WHY (via intermediate nodes).

Entity Alignment

Two KGs about the same world — e.g., Wikidata and Freebase — both have "Paris" but with different IDs. KG embeddings in a shared space can align entities across KGs by embedding proximity.

How This Lecture Connects to Others

Topic	Connection
Lec 9: Heterogeneous Graphs	KGs ARE heterogeneous graphs — entities are node types, relations are edge types. RGCN can be applied to KGs as a deep alternative to shallow embeddings.
Lec 11: Reasoning over KGs	Multi-hop reasoning — answering "Who are the siblings of Einstein's collaborators?" — extends KG embedding into path-based logic.
Lec 3-4: GNNs	GNN-based KG completion replaces static entity embeddings with neighborhood-aggregated representations for context-aware scoring.

Key takeaway from this lecture: The choice of scoring function encodes a geometric hypothesis about how relations work. TransE says "relations are translations." RotatE says "relations are rotations." Neither is universally correct — the right model depends on which relation patterns dominate your specific KG.

"The language of mathematics is not discovered — it is invented. But how well it fits the world is a miracle nonetheless."
— adapted from Wigner, on the unreasonable effectiveness of mathematics

← Lec 9: Heterogeneous Graphs