Zitnik, Agrawal, Leskovec — Bioinformatics 2018 · arXiv 1802.00543

Decagon: Predicting Drug Combination Side Effects

Taking two drugs can cause side effects that neither drug alone causes. With 1,000+ approved drugs, there are 500,000+ possible pairs — far too many to test in clinical trials. Decagon predicts them from a graph.

Prerequisites: GCN basics + knowledge graphs. That's it.
8
Chapters
4+
Simulations
964
Side Effect Types Modeled

Chapter 0: The Polypharmacy Problem

An elderly patient takes 8 drugs daily: for heart disease, diabetes, depression, and arthritis. This is polypharmacy — the simultaneous use of multiple medications. In the US, 40% of adults over 65 take 5+ drugs. For patients over 80, that number is over 50%.

The problem: drug combinations can cause polypharmacy side effects — adverse reactions that occur when drugs A and B are taken together, even when each is safe alone. Drug A might inhibit the enzyme that metabolizes Drug B, causing B to accumulate to toxic levels. Or both drugs might amplify each other's cardiac effects.

Scale of the Problem

With n drugs, there are n(n-1)/2 possible pairs. Drag the slider to see how fast the number of pairs grows.

Number of drugs 100
The testing gap: Clinical trials for drug combinations would require n(n-1)/2 pair trials, each taking years and costing millions. For 1,000 approved drugs, that's ~500,000 pairs. For 10,000 drugs (including investigational compounds), it's 50 million. We will never test even 1% of these. Yet patients are taking untested combinations every day. Adverse drug-drug interactions are estimated to cause ~125,000 deaths annually in the US.

Existing databases (DrugBank, TWOSIDES) contain known interactions, but are vastly incomplete. The question is: can we predict which drug combinations are dangerous — and what side effects they cause — from the molecular mechanism of the drugs?

Why is it impossible to test all drug combination side effects in clinical trials?

Chapter 1: The Multirelational Graph

Decagon builds a multimodal, multirelational graph combining three types of information: how drugs work (molecular targets), how proteins interact (protein network), and what combinations cause what effects (known side effects).

Node types

Drugs (645 nodes)
Each FDA-approved drug in the dataset. Represented by their chemical structure (Morgan fingerprint features) or by the graph structure alone.
Proteins (19,085 nodes)
Human proteins that drugs interact with (targets). Represented by their sequence-based features (GO terms, sequence embeddings).

Edge types

Edge TypeMeaningCount
Drug–ProteinDrug targets this protein (binds, inhibits, activates)18,596 edges
Protein–ProteinPhysical protein-protein interaction (PPI)719,402 edges
Drug–Drug (×964 types)This pair causes this specific polypharmacy side effect4,649,441 edges
Why 964 relation types? Each distinct polypharmacy side effect is a separate relation type. "Muscle weakness" is relation type 1; "bradycardia" is relation type 2; and so on. The TWOSIDES database contains 964 such side effect types with at least 500 drug pairs each. So the drug-drug layer is a multigraph with 964 different edge flavors — each edge (drug A, side effect r, drug B) means "taking drugs A and B together causes side effect r."
Heterogeneous Graph Structure

The Decagon graph has two node types (drugs and proteins) and hundreds of edge types. Click a drug node to see its connections.

Why does Decagon's drug-drug layer have 964 different edge types?

Chapter 2: The Decagon Model (Showcase)

Decagon is a graph autoencoder built on R-GCN (Relational Graph Convolution). The encoder learns node embeddings from the heterogeneous graph. The decoder predicts new drug-drug interaction edges.

Encoder: Relational Graph Convolution

The encoder processes the drug-protein-protein graph using a multi-relational GCN. For each node v, at each layer, it aggregates messages from neighbors, but uses different weight matrices for different edge types:

hv(k+1) = σ( W0(k) hv(k) + ∑r ∈ Ru ∈ Nr(v) &frac1;{cr,v} Wr(k) hu(k) )

Where R is the set of relation types, Nr(v) is the neighbors of v under relation r, and cr,v is a normalization constant. For Decagon, the relations are: drug-targets-protein, protein-interacts-protein (note: proteins don't aggregate from drug edges — the encoder respects the bipartite structure).

Decagon Architecture — Interactive Walkthrough

Follow the data flow from input features through R-GCN layers to the final side-effect-specific decoder. Click "Step" to advance through the pipeline.

Step 0: Input — drug features + protein features

Decoder: Side-effect-specific scoring

Given drug embeddings zi, zj from the encoder, the probability that drug pair (i, j) causes polypharmacy side effect r is:

pijr = σ( ziT Dr Rr Dr zj )

Where Dr is a diagonal matrix (per side effect, per drug) and Rr is a global matrix for side effect r. This is essentially a DistMult-style decoder with an extra per-relation diagonal scaling. The decoder has separate parameters for each of the 964 side effects.

What makes the Decagon encoder "relational" (R-GCN) rather than standard GCN?

Chapter 3: Side Effect Prediction

At test time, Decagon answers questions like: "What is the probability that taking Warfarin (a blood thinner) together with Aspirin causes thrombocytopenia (low platelet count)?"

This is framed as a link prediction problem on the drug-drug multirelational graph. For each side effect type r, we want to predict which drug pairs (i, j) have the corresponding edge.

The prediction pipeline

Drug i, Drug j
Look up node indices in the graph
z_i, z_j ← Encoder
R-GCN forward pass on full drug-protein graph
For each side effect r
Compute p^r_ij = σ(z_i^T D_r R_r D_r z_j)
Top-k side effects
Sort 964 probabilities; report highest-scoring ones
The embedding does the work: Drug i's embedding zi encodes not just chemical properties but also which proteins it targets and which interactions it has with other drugs in the training set. When Warfarin's embedding is near "coumadin anticoagulants" in embedding space, and Aspirin's embedding is near "platelet aggregation inhibitors," the model learns that their combination activates the "bleeding risk" side effect decoder.

Why model all 964 side effects jointly?

Modeling all side effects together (rather than 964 separate binary classifiers) is crucial for generalization. Side effects that share biological mechanisms are correlated — if drug pair (A,B) causes bradycardia, it's more likely to also cause hypotension (both are cardiovascular). The shared encoder zi, zj captures this correlation, improving prediction on rare side effects that have few training examples.

Why does modeling all 964 side effects jointly improve performance, especially for rare ones?

Chapter 4: Training

Decagon is trained as a graph autoencoder: maximize the likelihood of observed drug-drug interaction edges (positive examples), and minimize it for unobserved edges (negative examples). The loss is a cross-entropy over positive and negative drug pairs for each side effect type.

L = ∑r(i,j) ∈ Er log pijr + ∑(i,j) ∉ Er log(1 − pijr)

Where Er is the set of known drug pairs causing side effect r. For the negative samples, they use a fixed ratio of negative:positive pairs, sampling random drug pairs not known to have the interaction.

python (simplified)
# R-GCN forward pass (encoder)
def encode(x_drug, x_prot, adj_dp, adj_pp):
    # Layer 1: aggregate from neighbors by relation type
    h_drug = relu(W0_drug @ x_drug +
                  W_dp @ (adj_dp @ x_prot))  # drug ← protein
    h_prot = relu(W0_prot @ x_prot +
                  W_pd @ (adj_dp.T @ x_drug) +  # prot ← drug
                  W_pp @ (adj_pp @ x_prot))       # prot ← prot
    # Layer 2: repeat
    z_drug = W_out_drug @ h_drug
    return z_drug  # (n_drugs, latent_dim)

# Decoder for side effect r
def decode(z_i, z_j, D_r, R_r):
    # DistMult-style with per-drug diagonal scaling
    score = z_i @ D_r @ R_r @ D_r @ z_j
    return sigmoid(score)

# Training loss for side effect r
def loss_r(z, positive_pairs, negative_pairs, D_r, R_r):
    pos_scores = [decode(z[i], z[j], D_r, R_r) for i,j in positive_pairs]
    neg_scores = [decode(z[i], z[j], D_r, R_r) for i,j in negative_pairs]
    return -sum(log(p) for p in pos_scores) \
           -sum(log(1-p) for p in neg_scores)
Data split strategy: The known interactions from TWOSIDES are split into 80% train, 10% validation, 10% test. But the evaluation is designed carefully: they test on drug pairs where both drugs were seen in training (transductive), not on entirely new drugs. Predicting interactions for unseen drugs would require inductive learning — a much harder problem that Decagon doesn't address.
In Decagon's training, what plays the role of "negative examples" in the loss function?

Chapter 5: Results

Zitnik et al. evaluate Decagon on the TWOSIDES dataset, holding out 10% of drug-drug-sideeffect triples for testing. The primary metric is AUROC (area under the ROC curve) for distinguishing true drug-drug interactions from random pairs.

ModelAUROCAP (Avg Precision)p@50
DeepWalk (drug graph only)0.7480.6110.273
Concatenation MLP0.7930.6970.311
DistMult (KG baseline)0.7820.6710.295
KGNN-LS0.8220.7240.358
Decagon0.8720.8320.403
What the numbers mean: AUROC of 0.872 means: if you take one true drug-drug interaction (positive) and one random pair (negative), Decagon ranks the true one higher 87.2% of the time. p@50 = 0.403 means: in the top 50 predicted pairs for a given side effect, 40.3% are true positives from the held-out test set. The baseline DistMult (which ignores the protein network) scores 0.782 AUROC — Decagon's 9-point improvement comes directly from incorporating the drug-protein-protein interaction network.

Qualitative case study

Decagon predicted that combining Simvastatin (a cholesterol drug) with Niacin would cause myopathy (muscle disease). This was not in TWOSIDES training data. A literature search confirmed: this combination is a known clinical concern — Niacin inhibits a transporter that normally exports Simvastatin from muscle cells, causing dangerous accumulation. The protein-level mechanism was correctly captured by the graph structure.

Why does Decagon outperform DistMult on drug-drug interaction prediction?

Chapter 6: Drug Discovery Impact

Decagon represents a new paradigm in computational pharmacology: using graph-based machine learning to prioritize safety testing for drug combinations. Instead of testing all pairs in the lab, use Decagon to predict the high-risk combinations, then validate those computationally and experimentally.

The mechanistic insight

Decagon's graph structure encodes a key biological hypothesis: polypharmacy side effects arise when drugs affect overlapping parts of the protein interaction network. Two drugs that share protein targets, or whose targets are adjacent in the PPI network, are more likely to cause combination side effects. The R-GCN encoder propagates this network proximity into the drug embeddings.

Protein Network Overlap Hypothesis

Drug A targets proteins near Drug B's targets in the PPI network. Their combination perturbs the same biological pathway — causing a side effect neither causes alone. Move the slider to adjust protein network overlap.

Protein overlap 5
Why proteins matter for predicting drug combinations: Drug A might inhibit enzyme X in the cardiovascular pathway. Drug B might also modulate enzyme X, or an enzyme directly upstream. Alone, each maintains homeostasis. Together, they overwhelm the pathway's regulatory capacity. Without the protein network, a model can only learn from drug-drug co-occurrence. With the protein network, it can identify this mechanistic overlap even for drug pairs never tested together.

Clinical relevance

The 964 side effects Decagon models span the major categories of drug-induced harm:

Cardiovascular (top risk)
Bradycardia, QT prolongation, hypertension — combinations that stress the heart system
Hematological
Thrombocytopenia, neutropenia — combinations that affect blood cell counts

Musculoskeletal
Myopathy, rhabdomyolysis — combinations that damage muscle tissue
Metabolic
Hyperglycemia, hyponatremia — combinations that disrupt metabolic balance
According to the biological hypothesis in Decagon, why do some drug combinations cause side effects that neither drug causes alone?

Chapter 7: Connections

Decagon sits at the intersection of graph representation learning, knowledge graph completion, and computational biology. It opened a research area: graph neural networks for biomedical knowledge graphs.

MethodRelation to Decagon
R-GCN (Schlichtkrull et al.)The encoder architecture (published same year; Decagon independently extends it to bipartite setting)
DistMultThe decoder scoring function (extended with diagonal scaling)
DRKG (Drug Repurposing KG)Extends Decagon's graph with disease, pathway, and genomic data
BioKG / KG4SLSame paradigm applied to cancer synthetic lethality
HetionetLarger heterogeneous biomedical KG that inspired Decagon's design
COVID-19 drug repurposing (2020)Multiple groups used Decagon-style models to predict COVID drug combinations
The inductive problem: Decagon is transductive — it can predict interactions between drugs seen during training, but not for novel drugs entering the market. Extending to inductive settings (where the drug's molecular structure or target profile is the only input) is the main open problem. Recent work uses molecular GNNs (learning from atom-level graphs) as the drug encoder, enabling generalization to new compounds.

Limitations to be aware of

Closing thought: Decagon demonstrates that graph neural networks are not just a computer science tool — they map directly onto the structure of biological knowledge. Proteins interact; drugs bind proteins; side effects emerge from perturbation patterns. When your data has natural graph structure, a graph model isn't just technically appropriate — it's mechanistically meaningful. The embeddings encode biology, not just statistics.