Linear Algebra — Kreyszig

Chapter 0: Why Matrices?

You have three equations with three unknowns. Write them out, solve by substitution. Tedious but possible. Now imagine a thousand equations with a thousand unknowns. That is what real engineering looks like: finite element models, circuit networks, machine learning weight matrices.

A matrix is a rectangular array of numbers. An m × n matrix has m rows and n columns. We write it as:

A = [a_ij], i = 1,...,m, j = 1,...,n

where a_ij is the entry in row i, column j. A vector is a matrix with one column (column vector) or one row (row vector).

The core idea: Matrices let us write systems of equations as a single equation Ax = b. Hundreds of equations become one compact expression. Better yet, matrix operations have beautiful geometric interpretations: rotations, reflections, projections, scaling.

A square matrix has m = n. The identity matrix I has 1s on the diagonal and 0s elsewhere. For any matrix A, AI = IA = A. It is the matrix equivalent of multiplying by 1.

What is the entry a₂₃ in a matrix?

The element in row 2, column 3 The element in row 3, column 2 The product of entries 2 and 3

Chapter 1: Matrix Operations

Addition: A + B is defined only when A and B have the same dimensions. Add entry by entry: (A + B)_ij = a_ij + b_ij.

Scalar multiplication: cA multiplies every entry by c: (cA)_ij = c · a_ij.

Matrix multiplication: If A is m × n and B is n × p, then AB is m × p. The entry (AB)_ij is the dot product of row i of A with column j of B:

(AB)_ij = ∑_k=1ⁿ a_ik b_kj

Critical warning: Matrix multiplication is not commutative. In general, AB ≠ BA. This is one of the most important differences from ordinary algebra. Rotation by 90° then reflection is not the same as reflection then rotation by 90°.

Transpose: A^T flips rows and columns: (A^T)_ij = a_ji. A matrix is symmetric if A^T = A (symmetric about the main diagonal).

Inverse: If A is square and A⁻¹ exists, then AA⁻¹ = A⁻¹A = I. Not all matrices have inverses. Those that do are called nonsingular or invertible.

Why is matrix multiplication not commutative (AB ≠ BA in general)?

Because matrices are always square Because the row-column dot product structure makes the order of operations matter geometrically (e.g., rotate then reflect ≠ reflect then rotate) Because matrices cannot be added

Chapter 2: Gauss Elimination

The workhorse algorithm for solving Ax = b. The idea: use elementary row operations to transform the system into upper triangular form, then back-substitute.

The three elementary row operations are:

Operation	Effect
Swap two rows	Reorder equations (no mathematical change)
Multiply a row by a nonzero scalar	Scale an equation
Add a multiple of one row to another	Eliminate a variable

Augmented matrix: Write [A | b] and operate on rows. This is exactly the same as manipulating the equations, just more compact.

Example: Solve x + 2y = 5, 3x + 4y = 11.

Augmented: [1 2 | 5; 3 4 | 11]. Subtract 3×Row1 from Row2: [1 2 | 5; 0 −2 | −4]. Back-substitute: y = 2, x = 1.

Gauss Elimination Stepper

Watch the algorithm eliminate variables step by step. The pivot (orange) eliminates entries below it. Click "Step" to advance.

Press Step to begin.

The rank of a matrix is the number of nonzero rows after elimination. It tells you the number of independent equations. If rank(A) = rank([A|b]) = n (number of unknowns), the system has a unique solution.

Key insight: Gauss elimination is not just a solution technique. It reveals the structure of the solution space: unique solution, infinitely many solutions, or no solution at all. The rank determines everything.

What does it mean if rank(A) < number of unknowns?

The system has infinitely many solutions (or none if the system is inconsistent) The system has a unique solution The matrix is symmetric

Chapter 3: Determinants

The determinant det(A) is a single number computed from a square matrix. For 2×2:

det [[a, b], [c, d]] = ad − bc

For 3×3, expand along the first row (cofactor expansion):

det(A) = a₁₁(a₂₂a₃₃ − a₂₃a₃₂) − a₁₂(a₂₁a₃₃ − a₂₃a₃₁) + a₁₃(a₂₁a₃₂ − a₂₂a₃₁)

Geometric meaning: The absolute value |det(A)| is the volume scaling factor of the linear transformation defined by A. For a 2×2 matrix, |det(A)| is the area of the parallelogram formed by the column vectors. If det(A) = 0, the transformation squashes space into a lower dimension.

Key properties:

Property	Formula
A is invertible	det(A) ≠ 0
Product rule	det(AB) = det(A) · det(B)
Transpose	det(A^T) = det(A)
Inverse	det(A⁻¹) = 1/det(A)
Row swap	Changes sign of det

Determinant as Area

Drag the column vectors of a 2×2 matrix. The parallelogram area equals |det(A)|. When det = 0, the vectors are parallel (linearly dependent).

a₁₁2.0

a₂₁0.5

a₁₂0.5

a₂₂2.0

If det(A) = 0, what does it mean geometrically?

The transformation collapses at least one dimension (the columns are linearly dependent) The matrix is the identity The matrix is symmetric

Chapter 4: Eigenvalues & Eigenvectors

An eigenvector of a matrix A is a nonzero vector v that only gets scaled (not rotated) when multiplied by A:

Av = λv

The scalar λ is the eigenvalue. "Eigen" is German for "own" or "characteristic." Eigenvectors are the directions that the transformation preserves.

To find eigenvalues, rearrange: (A − λI)v = 0. For a nontrivial solution, the coefficient matrix must be singular:

det(A − λI) = 0

This is the characteristic equation. For an n×n matrix, it is a degree-n polynomial in λ.

Key insight: The characteristic equation here is exactly the characteristic equation from second-order ODEs. The connection is deep: solving y'' + py' + qy = 0 is equivalent to finding eigenvalues of the companion matrix [[0, 1], [−q, −p]]. ODEs and linear algebra are two views of the same mathematics.

Example: A = [[4, 1], [2, 3]]. Characteristic equation: (4−λ)(3−λ) − 2 = λ² − 7λ + 10 = (λ−5)(λ−2) = 0. Eigenvalues: λ₁ = 5, λ₂ = 2.

Eigenvector Visualizer

The matrix A maps every point. The orange and teal arrows show eigenvectors — directions that only get stretched, not rotated. Adjust the matrix to see how eigenvectors change.

a₁₁2.0

a₁₂1.0

a₂₁1.0

a₂₂2.0

If Av = 3v, what is A²v?

6v 9v 3v

Chapter 5: Diagonalization

If an n×n matrix A has n linearly independent eigenvectors v₁,...,v_n, we can form the matrix P = [v₁ | ... | v_n] and write:

A = PDP⁻¹

where D = diag(λ₁,...,λ_n) is the diagonal matrix of eigenvalues. This is diagonalization.

Why diagonalize? Powers become trivial: A^k = PD^kP⁻¹. Since D is diagonal, D^k = diag(λ₁^k,...,λ_n^k). Computing A¹⁰⁰⁰ is instant. This powers everything from Google's PageRank to Markov chains to solving systems of ODEs.

Not all matrices can be diagonalized. A matrix is diagonalizable if and only if it has n linearly independent eigenvectors. Symmetric matrices (A^T = A) are always diagonalizable, and their eigenvalues are always real. Even better, their eigenvectors are orthogonal.

Quadratic forms: For a symmetric matrix A, the expression x^TAx is a quadratic form. Its behavior is determined by the eigenvalues: if all λ_i > 0, the form is positive definite (bowl-shaped). If mixed signs, it is a saddle.

Why are symmetric matrices special in linear algebra?

They always have real eigenvalues and orthogonal eigenvectors (always diagonalizable) They always have determinant 1 They are always invertible

Chapter 6: Singular Value Decomposition

Not every matrix is square or diagonalizable. The SVD works for any m×n matrix. It decomposes A as:

A = U Σ V^T

where U is m×m orthogonal, V is n×n orthogonal, and Σ is m×n diagonal with the singular values σ₁ ≥ σ₂ ≥ ... ≥ 0 on the diagonal.

Key insight: Every linear transformation can be decomposed into three steps: (1) rotate/reflect the input space (V^T), (2) scale along axes by σ_i (Σ), (3) rotate/reflect the output space (U). The SVD reveals the true geometry of any matrix.

The singular values are the square roots of the eigenvalues of A^TA (or AA^T). The columns of U are the left singular vectors and the columns of V are the right singular vectors.

Application	How SVD is used
Data compression	Keep only the largest singular values (low-rank approximation)
Pseudoinverse	Solve least-squares problems for non-square or singular A
PCA	SVD of centered data matrix gives principal components
Recommender systems	Matrix factorization (Netflix prize approach)
Numerical rank	Count singular values above a threshold

What is the geometric interpretation of the SVD?

Every linear map is a rotation, then a scaling, then another rotation Every matrix is symmetric Every matrix has positive eigenvalues

Chapter 7: Applications

Systems of ODEs (revisited)

The system x' = Ax has solution x(t) = e^Atx(0). If A = PDP⁻¹, then e^At = P·diag(e^λ₁t,...,e^λ_nt)·P⁻¹. Diagonalization converts a coupled system into n independent equations.

Markov Chains

A Markov chain describes random transitions between states. The transition matrix M has entries m_ij = probability of going from state j to state i. The state after k steps is x^(k) = M^kx⁽⁰⁾. The steady-state is the eigenvector of M with eigenvalue 1.

Least Squares

When Ax = b has no exact solution (overdetermined), the least-squares solution minimizes ||Ax − b||². It satisfies the normal equations:

A^TA x = A^Tb

Connection to machine learning: Linear regression is exactly least squares. The "weights" are x = (A^TA)⁻¹A^Tb. The matrix (A^TA)⁻¹A^T is the pseudoinverse of A.

What is the steady state of a Markov chain?

The eigenvector of the transition matrix with eigenvalue 1 The inverse of the transition matrix The determinant of the transition matrix

Chapter 8: Linear Transformation Lab

Every 2×2 matrix defines a transformation of the plane. Watch how it maps the unit square, the unit circle, and individual points. Explore rotations, reflections, shears, and projections.

2D Transformation Explorer

The teal unit circle becomes the orange ellipse under transformation A. Eigenvectors shown as thick lines. Grid lines show how the whole plane deforms.

a₁₁1.5

a₁₂0.5

a₂₁0.5

a₂₂1.5

The SVD in action: The singular values are the semi-axes of the output ellipse. The left singular vectors (U) give the orientation of the ellipse. The right singular vectors (V) give the input directions that map to the axes. Every matrix maps circles to ellipses.

Chapter 9: Connections

This lesson	Where it leads
Eigenvalues	ODE systems (Ch 2), stability analysis, quantum mechanics
Diagonalization	Matrix exponential e^At, Markov chains, Google PageRank
SVD	PCA, dimensionality reduction, recommender systems
Determinants	Change of variables in integrals (Jacobian), volume forms
Rank	Solvability theory for linear systems, null spaces
Least squares	Linear regression, signal processing, curve fitting

Historical note: Eigenvalues were introduced by Cauchy in 1829 for quadratic forms. The word "eigen" was introduced by Hilbert around 1904. Today eigenvalues are arguably the single most important concept in applied mathematics.

"The introduction of numbers as coordinates is an act of violence." — Hermann Weyl

What is the SVD of a matrix A?

A = UΣV^T, decomposing any matrix into rotation-scaling-rotation A = LU, decomposing into lower and upper triangular A = QR, decomposing into orthogonal and upper triangular