Polypharmacy Side Effects

Chapter 0: The Polypharmacy Problem

An elderly patient takes 8 drugs daily: for heart disease, diabetes, depression, and arthritis. This is polypharmacy — the simultaneous use of multiple medications. In the US, 40% of adults over 65 take 5+ drugs. For patients over 80, that number is over 50%.

The problem: drug combinations can cause polypharmacy side effects — adverse reactions that occur when drugs A and B are taken together, even when each is safe alone. Drug A might inhibit the enzyme that metabolizes Drug B, causing B to accumulate to toxic levels. Or both drugs might amplify each other's cardiac effects.

Scale of the Problem

With n drugs, there are n(n-1)/2 possible pairs. Drag the slider to see how fast the number of pairs grows.

Number of drugs 100

The testing gap: Clinical trials for drug combinations would require n(n-1)/2 pair trials, each taking years and costing millions. For 1,000 approved drugs, that's ~500,000 pairs. For 10,000 drugs (including investigational compounds), it's 50 million. We will never test even 1% of these. Yet patients are taking untested combinations every day. Adverse drug-drug interactions are estimated to cause ~125,000 deaths annually in the US.

Existing databases (DrugBank, TWOSIDES) contain known interactions, but are vastly incomplete. The question is: can we predict which drug combinations are dangerous — and what side effects they cause — from the molecular mechanism of the drugs?

Why is it impossible to test all drug combination side effects in clinical trials?

The number of pairs grows quadratically — with 1,000 drugs, there are ~500,000 pairs, each requiring years and millions of dollars to test clinically Regulatory agencies don't allow combination drug testing Side effects only occur in rare patients and can't be measured statistically

Chapter 1: The Multirelational Graph

Decagon builds a multimodal, multirelational graph combining three types of information: how drugs work (molecular targets), how proteins interact (protein network), and what combinations cause what effects (known side effects).

Node types

Drugs (645 nodes)
Each FDA-approved drug in the dataset. Represented by their chemical structure (Morgan fingerprint features) or by the graph structure alone.

Proteins (19,085 nodes)
Human proteins that drugs interact with (targets). Represented by their sequence-based features (GO terms, sequence embeddings).

Edge types

Edge Type	Meaning	Count
Drug–Protein	Drug targets this protein (binds, inhibits, activates)	18,596 edges
Protein–Protein	Physical protein-protein interaction (PPI)	719,402 edges
Drug–Drug (×964 types)	This pair causes this specific polypharmacy side effect	4,649,441 edges

Why 964 relation types? Each distinct polypharmacy side effect is a separate relation type. "Muscle weakness" is relation type 1; "bradycardia" is relation type 2; and so on. The TWOSIDES database contains 964 such side effect types with at least 500 drug pairs each. So the drug-drug layer is a multigraph with 964 different edge flavors — each edge (drug A, side effect r, drug B) means "taking drugs A and B together causes side effect r."

Heterogeneous Graph Structure

The Decagon graph has two node types (drugs and proteins) and hundreds of edge types. Click a drug node to see its connections.

Why does Decagon's drug-drug layer have 964 different edge types?

Each side effect type is a separate relation — an edge (A, r, B) means "taking drugs A and B causes side effect r," and there are 964 different side effects to predict There are 964 different FDA-approved drug classes 964 is the number of drugs in the dataset

Chapter 2: The Decagon Model (Showcase)

Decagon is a graph autoencoder built on R-GCN (Relational Graph Convolution). The encoder learns node embeddings from the heterogeneous graph. The decoder predicts new drug-drug interaction edges.

Encoder: Relational Graph Convolution

The encoder processes the drug-protein-protein graph using a multi-relational GCN. For each node v, at each layer, it aggregates messages from neighbors, but uses different weight matrices for different edge types:

h_v^(k+1) = σ( W₀^(k) h_v^(k) + ∑_{r ∈ R} ∑_{u ∈ N_r(v)} 1⁄c_r,v W_r^(k) h_u^(k) )

Where R is the set of relation types, N_r(v) is the neighbors of v under relation r, and c_r,v is a normalization constant. For Decagon, the relations are: drug-targets-protein, protein-interacts-protein (note: proteins don't aggregate from drug edges — the encoder respects the bipartite structure).

Decagon Architecture — Interactive Walkthrough

Follow the data flow from input features through R-GCN layers to the final side-effect-specific decoder. Click "Step" to advance through the pipeline.

Step 0: Input — drug features + protein features

Decoder: Side-effect-specific scoring

Given drug embeddings z_i, z_j from the encoder, the probability that drug pair (i, j) causes polypharmacy side effect r is:

p_ij^r = σ( z_i^T D_r R_r D_r z_j )

Where D_r is a diagonal matrix (per side effect, per drug) and R_r is a global matrix for side effect r. This is essentially a DistMult-style decoder with an extra per-relation diagonal scaling. The decoder has separate parameters for each of the 964 side effects.

What makes the Decagon encoder "relational" (R-GCN) rather than standard GCN?

Different weight matrices W_r are used for different edge types — drug-protein edges and protein-protein edges use separate parameters, allowing relation-specific message passing R-GCN uses relation-specific dropout rates R-GCN processes drug nodes and protein nodes with entirely separate networks

Chapter 3: Side Effect Prediction

At test time, Decagon answers questions like: "What is the probability that taking Warfarin (a blood thinner) together with Aspirin causes thrombocytopenia (low platelet count)?"

This is framed as a link prediction problem on the drug-drug multirelational graph. For each side effect type r, we want to predict which drug pairs (i, j) have the corresponding edge.

The prediction pipeline

Drug i, Drug j

Look up node indices in the graph

↓

z_i, z_j ← Encoder

R-GCN forward pass on full drug-protein graph

↓

For each side effect r

Compute p^r_ij = σ(z_i^T D_r R_r D_r z_j)

↓

Top-k side effects

Sort 964 probabilities; report highest-scoring ones

The embedding does the work: Drug i's embedding z_i encodes not just chemical properties but also which proteins it targets and which interactions it has with other drugs in the training set. When Warfarin's embedding is near "coumadin anticoagulants" in embedding space, and Aspirin's embedding is near "platelet aggregation inhibitors," the model learns that their combination activates the "bleeding risk" side effect decoder.

Why model all 964 side effects jointly?

Modeling all side effects together (rather than 964 separate binary classifiers) is crucial for generalization. Side effects that share biological mechanisms are correlated — if drug pair (A,B) causes bradycardia, it's more likely to also cause hypotension (both are cardiovascular). The shared encoder z_i, z_j captures this correlation, improving prediction on rare side effects that have few training examples.

Why does modeling all 964 side effects jointly improve performance, especially for rare ones?

Biologically related side effects are correlated — sharing the encoder across all side effects lets rare ones "borrow strength" from related well-observed ones through shared drug embeddings Training 964 classifiers separately would require 964x more compute The decoder must output 964 values simultaneously, requiring joint training

Chapter 4: Training

Decagon is trained as a graph autoencoder: maximize the likelihood of observed drug-drug interaction edges (positive examples), and minimize it for unobserved edges (negative examples). The loss is a cross-entropy over positive and negative drug pairs for each side effect type.

L = ∑_r ∑_{(i,j) ∈ E_r} log p_ij^r + ∑_{(i,j) ∉ E_r} log(1 − p_ij^r)

Where E_r is the set of known drug pairs causing side effect r. For the negative samples, they use a fixed ratio of negative:positive pairs, sampling random drug pairs not known to have the interaction.

python (simplified)
# R-GCN forward pass (encoder)
def encode(x_drug, x_prot, adj_dp, adj_pp):
    # Layer 1: aggregate from neighbors by relation type
    h_drug = relu(W0_drug @ x_drug +
                  W_dp @ (adj_dp @ x_prot))  # drug ← protein
    h_prot = relu(W0_prot @ x_prot +
                  W_pd @ (adj_dp.T @ x_drug) +  # prot ← drug
                  W_pp @ (adj_pp @ x_prot))       # prot ← prot
    # Layer 2: repeat
    z_drug = W_out_drug @ h_drug
    return z_drug  # (n_drugs, latent_dim)

# Decoder for side effect r
def decode(z_i, z_j, D_r, R_r):
    # DistMult-style with per-drug diagonal scaling
    score = z_i @ D_r @ R_r @ D_r @ z_j
    return sigmoid(score)

# Training loss for side effect r
def loss_r(z, positive_pairs, negative_pairs, D_r, R_r):
    pos_scores = [decode(z[i], z[j], D_r, R_r) for i,j in positive_pairs]
    neg_scores = [decode(z[i], z[j], D_r, R_r) for i,j in negative_pairs]
    return -sum(log(p) for p in pos_scores) \
           -sum(log(1-p) for p in neg_scores)

Data split strategy: The known interactions from TWOSIDES are split into 80% train, 10% validation, 10% test. But the evaluation is designed carefully: they test on drug pairs where both drugs were seen in training (transductive), not on entirely new drugs. Predicting interactions for unseen drugs would require inductive learning — a much harder problem that Decagon doesn't address.

In Decagon's training, what plays the role of "negative examples" in the loss function?

Random drug pairs not recorded as causing the specific side effect — the model learns to give these low probability scores Drug pairs from a different disease domain Drug pairs that are chemically similar but don't interact

Chapter 5: Results

Zitnik et al. evaluate Decagon on the TWOSIDES dataset, holding out 10% of drug-drug-sideeffect triples for testing. The primary metric is AUROC (area under the ROC curve) for distinguishing true drug-drug interactions from random pairs.

Model	AUROC	AP (Avg Precision)	p@50
DeepWalk (drug graph only)	0.748	0.611	0.273
Concatenation MLP	0.793	0.697	0.311
DistMult (KG baseline)	0.782	0.671	0.295
KGNN-LS	0.822	0.724	0.358
Decagon	0.872	0.832	0.403

What the numbers mean: AUROC of 0.872 means: if you take one true drug-drug interaction (positive) and one random pair (negative), Decagon ranks the true one higher 87.2% of the time. p@50 = 0.403 means: in the top 50 predicted pairs for a given side effect, 40.3% are true positives from the held-out test set. The baseline DistMult (which ignores the protein network) scores 0.782 AUROC — Decagon's 9-point improvement comes directly from incorporating the drug-protein-protein interaction network.

Qualitative case study

Decagon predicted that combining Simvastatin (a cholesterol drug) with Niacin would cause myopathy (muscle disease). This was not in TWOSIDES training data. A literature search confirmed: this combination is a known clinical concern — Niacin inhibits a transporter that normally exports Simvastatin from muscle cells, causing dangerous accumulation. The protein-level mechanism was correctly captured by the graph structure.

Why does Decagon outperform DistMult on drug-drug interaction prediction?

Decagon's R-GCN encoder incorporates the protein-protein interaction network and drug-target information — it learns drug embeddings that reflect molecular mechanisms, not just co-occurrence patterns Decagon uses a deeper decoder with more layers than DistMult Decagon uses a larger embedding dimension

Chapter 6: Drug Discovery Impact

Decagon represents a new paradigm in computational pharmacology: using graph-based machine learning to prioritize safety testing for drug combinations. Instead of testing all pairs in the lab, use Decagon to predict the high-risk combinations, then validate those computationally and experimentally.

The mechanistic insight

Decagon's graph structure encodes a key biological hypothesis: polypharmacy side effects arise when drugs affect overlapping parts of the protein interaction network. Two drugs that share protein targets, or whose targets are adjacent in the PPI network, are more likely to cause combination side effects. The R-GCN encoder propagates this network proximity into the drug embeddings.

Protein Network Overlap Hypothesis

Drug A targets proteins near Drug B's targets in the PPI network. Their combination perturbs the same biological pathway — causing a side effect neither causes alone. Move the slider to adjust protein network overlap.

Protein overlap 5

Why proteins matter for predicting drug combinations: Drug A might inhibit enzyme X in the cardiovascular pathway. Drug B might also modulate enzyme X, or an enzyme directly upstream. Alone, each maintains homeostasis. Together, they overwhelm the pathway's regulatory capacity. Without the protein network, a model can only learn from drug-drug co-occurrence. With the protein network, it can identify this mechanistic overlap even for drug pairs never tested together.

Clinical relevance

The 964 side effects Decagon models span the major categories of drug-induced harm:

Cardiovascular (top risk)
Bradycardia, QT prolongation, hypertension — combinations that stress the heart system

Hematological
Thrombocytopenia, neutropenia — combinations that affect blood cell counts

Musculoskeletal
Myopathy, rhabdomyolysis — combinations that damage muscle tissue

Metabolic
Hyperglycemia, hyponatremia — combinations that disrupt metabolic balance

According to the biological hypothesis in Decagon, why do some drug combinations cause side effects that neither drug causes alone?

The drugs affect overlapping or adjacent proteins in the PPI network, together perturbing a biological pathway beyond the threshold that causes clinical harm One drug changes how the other drug is metabolized, raising its concentration Both drugs are chemically similar and react with each other in the bloodstream

Chapter 7: Connections

Decagon sits at the intersection of graph representation learning, knowledge graph completion, and computational biology. It opened a research area: graph neural networks for biomedical knowledge graphs.

Method	Relation to Decagon
R-GCN (Schlichtkrull et al.)	The encoder architecture (published same year; Decagon independently extends it to bipartite setting)
DistMult	The decoder scoring function (extended with diagonal scaling)
DRKG (Drug Repurposing KG)	Extends Decagon's graph with disease, pathway, and genomic data
BioKG / KG4SL	Same paradigm applied to cancer synthetic lethality
Hetionet	Larger heterogeneous biomedical KG that inspired Decagon's design
COVID-19 drug repurposing (2020)	Multiple groups used Decagon-style models to predict COVID drug combinations

The inductive problem: Decagon is transductive — it can predict interactions between drugs seen during training, but not for novel drugs entering the market. Extending to inductive settings (where the drug's molecular structure or target profile is the only input) is the main open problem. Recent work uses molecular GNNs (learning from atom-level graphs) as the drug encoder, enabling generalization to new compounds.

Limitations to be aware of

Negative data quality: TWOSIDES "negatives" (no recorded interaction) are not verified safe combinations — they may be untested. The model learns from absence of evidence, not evidence of absence.
Transductive only: Can't predict for new drugs without retraining.
Side effect specificity: Side effect types are coarse (MedDRA terms). Predicting patient-specific risk requires patient-level features not included here.
Protein network completeness: The PPI network is known to be ~30% complete. Incomplete networks limit what mechanistic information can be captured.

Closing thought: Decagon demonstrates that graph neural networks are not just a computer science tool — they map directly onto the structure of biological knowledge. Proteins interact; drugs bind proteins; side effects emerge from perturbation patterns. When your data has natural graph structure, a graph model isn't just technically appropriate — it's mechanistically meaningful. The embeddings encode biology, not just statistics.

Decagon: Predicting Drug Combination Side Effects