Topic · Linear Algebra

Matrix Decompositions

Almost no serious computation is done on a raw matrix. Instead, matrices are factored into products of simpler pieces — triangular, orthogonal, diagonal — each exposing a different aspect of $A$ and unlocking a different class of computations. Four factorizations carry most of the load: LU, QR, eigendecomposition, and SVD.

What you'll leave with

  • A working map of which factorization to reach for, and why.
  • LU as Gaussian elimination repackaged for repeated solves of $Ax = b$.
  • QR as the orthonormalization move that makes least squares numerically tame.
  • The spectral theorem: symmetric matrices have real eigenvalues and orthogonal eigenvectors — a much cleaner story than the general eigendecomposition.
  • SVD as the universal decomposition — rotate, scale, rotate — and the Eckart–Young theorem that makes it the engine behind PCA, compression, and recommender systems.

1. Why factor a matrix?

A factorization $A = B \cdot C \cdot \ldots$ rewrites a matrix as a product where each factor is structured — triangular, orthogonal, diagonal, or otherwise easier to work with than $A$ itself. That structure pays off in two ways.

Cheap repeated computation. Solving $Ax = b$ from scratch costs $O(n^3)$. But once you have a factorization, solving with a new right-hand side $b'$ costs only $O(n^2)$ — the expensive part is amortized over every solve.

Geometric insight. A matrix is a linear map. A factorization breaks that map into a sequence of moves you can see: a rotation, a scaling along axes, another rotation. What the matrix does becomes legible.

Different factorizations exist because no single one is best at everything. They trade generality (which matrices they apply to) against structure (what the factors reveal). Here is the cheat sheet you'll build out across the rest of this page:

DecompositionFormWorks onHeadline use
LU$A = LU$Square $A$ (with pivoting, any invertible square)Solving $Ax = b$ many times
QR$A = QR$Any $m \times n$ with $m \geq n$Least squares
Eigendecomposition$A = P D P^{-1}$Square, diagonalizablePowers, dynamics, $A^k$
Spectral$A = Q D Q^T$Symmetric (real)Quadratic forms, PCA setup
SVD$A = U \Sigma V^T$Any matrix, any shapeRank, conditioning, approximation

2. LU: elimination, frozen and reused

LU decomposition

For a square matrix $A$, an LU decomposition is a factorization $A = LU$ where $L$ is lower-triangular with $1$s on the diagonal and $U$ is upper-triangular. With partial pivoting, every invertible square matrix admits the form $PA = LU$ where $P$ is a permutation matrix.

LU is Gaussian elimination repackaged. When you reduce $A$ to upper-triangular form by row operations, the row operations themselves — the multipliers you used to zero out entries — assemble into the matrix $L$. The triangular result is $U$.

A worked instance:

$$ A = \begin{pmatrix} 2 & 1 \\ 4 & 3 \end{pmatrix} = \underbrace{\begin{pmatrix} 1 & 0 \\ 2 & 1 \end{pmatrix}}_{L} \, \underbrace{\begin{pmatrix} 2 & 1 \\ 0 & 1 \end{pmatrix}}_{U} $$

The $L$ entry $\ell_{21} = 2$ records the multiplier used to zero out the $4$ in the bottom-left: subtract twice row 1 from row 2. The $U$ is what's left.

Why bother — solving $Ax = b$ many times

Suppose you need to solve $Ax = b_1, Ax = b_2, \ldots, Ax = b_{100}$ for the same $A$. Doing each from scratch costs $100 \cdot O(n^3)$. With $A = LU$ computed once:

  1. Factor: $A = LU$ once, at cost $O(n^3)$.
  2. For each $b_i$: solve $Ly = b_i$ by forward substitution ($O(n^2)$), then $Ux = y$ by back substitution ($O(n^2)$).

Total: $O(n^3) + 100 \cdot O(n^2)$ — the factorization is amortized over every solve. Triangular systems are cheap; the whole point of LU is to turn the hard general system into two easy triangular ones.

Always pivot

Pure LU (without row swaps) fails when a pivot is zero, and is numerically unstable when a pivot is just small. In practice you always do PLU with partial pivoting — swap rows so the largest available entry sits in the pivot position. This is what every library actually computes.

3. QR: orthonormal columns for least squares

QR decomposition

For an $m \times n$ matrix $A$ with $m \geq n$, the QR decomposition is $A = QR$ where $Q$ has orthonormal columns ($Q^T Q = I$) and $R$ is upper-triangular. If $A$ has full column rank, $R$ is invertible.

The most direct way to see QR is Gram–Schmidt: take the columns of $A$ one at a time, subtract off the components already explained by earlier columns, and normalize what remains. The orthonormal vectors you produce are the columns of $Q$; the bookkeeping coefficients — how much of each earlier direction each new column contained — sit in $R$. The upper-triangular shape of $R$ is just a record of the fact that column $j$ of $A$ only depends on columns $1, \ldots, j$ of $Q$.

Why it matters: least squares without squaring the condition number

The least-squares problem is to find $\hat{x}$ minimizing $\|Ax - b\|_2$ when $A$ is tall ($m > n$) — overdetermined. The textbook answer is the normal equations:

$$ A^T A \, \hat{x} = A^T b $$

Algebraically clean, numerically a trap. Forming $A^T A$ squares the condition number of $A$: if $A$ is mildly ill-conditioned, $A^T A$ may be unsolvable in finite precision.

QR sidesteps the squaring entirely. Substitute $A = QR$ into the normal equations:

$$ \begin{aligned} (QR)^T (QR) \hat{x} &= (QR)^T b \\ R^T Q^T Q R \hat{x} &= R^T Q^T b \\ R^T R \hat{x} &= R^T Q^T b \\ R \hat{x} &= Q^T b \end{aligned} $$

The last line drops the $R^T$ from both sides (valid because $R$ is invertible). What remains is a single triangular system $R \hat{x} = Q^T b$ — solved by back substitution. The condition number of $R$ equals that of $A$; no squaring.

Tip

Classical Gram–Schmidt is mathematically right but numerically loses orthogonality in finite precision. Production code uses Householder reflections or Givens rotations — different algorithms that produce the same $Q$ and $R$ but are far more stable.

4. Eigendecomposition & the spectral theorem

You already met eigenvalues and eigenvectors in the previous topic. The decomposition is just the matrix-form restatement of what an eigenbasis lets you do.

Eigendecomposition

If a square matrix $A$ has $n$ linearly independent eigenvectors $v_1, \ldots, v_n$ with eigenvalues $\lambda_1, \ldots, \lambda_n$, then $A$ is diagonalizable and

$$ A = P D P^{-1} $$

where the columns of $P$ are the eigenvectors and $D$ is diagonal with the eigenvalues on its diagonal.

Read this from right to left as a coordinate-change story. $P^{-1}$ maps the standard basis to the eigenbasis. $D$ scales each eigenbasis direction by its eigenvalue — no mixing. $P$ maps back to standard coordinates. The matrix $A$, in eigenbasis coordinates, is just a list of stretch factors.

This makes $A^k$ trivial:

$$ A^k = (PDP^{-1})^k = P D^k P^{-1} $$

because the $P^{-1} P$ pairs collapse in the middle. Raising a diagonal matrix to the $k$th power is just raising each diagonal entry to the $k$th power. Markov chains, linear ODEs, repeated linear iterations — all become elementary once you have the eigendecomposition.

The spectral theorem: a much cleaner story for symmetric matrices

General eigendecomposition has rough edges. Eigenvalues can be complex; eigenvectors of distinct eigenvalues need not be orthogonal; some matrices aren't diagonalizable at all. Symmetric real matrices are an oasis where every rough edge disappears.

Spectral theorem

If $A$ is a real symmetric matrix ($A^T = A$), then:

  • All eigenvalues of $A$ are real.
  • Eigenvectors corresponding to distinct eigenvalues are orthogonal; and even when eigenvalues repeat, an orthonormal basis of eigenvectors exists.
  • Therefore $A$ admits the orthogonal diagonalization $A = Q D Q^T$ where $Q$ is orthogonal ($Q^T Q = I$) and $D$ is real diagonal.

The form $A = QDQ^T$ is the spectral decomposition. The improvement over $PDP^{-1}$ is enormous: $Q^{-1} = Q^T$ for free, so applying $A$ is rotate-scale-rotate-back, with the same orthogonal frame in both directions. This is the foundation for PCA, quadratic-form analysis, and — as you're about to see — the SVD.

5. SVD: the Swiss Army knife

Eigendecomposition is gorgeous when it applies, but it has rigid requirements: square and diagonalizable. Most matrices you actually meet — a data matrix of users by movies, an image as a grid of pixels — are rectangular. Eigendecomposition has nothing to say about them.

The singular value decomposition handles every matrix, every shape, no exceptions.

Singular value decomposition

For any real $m \times n$ matrix $A$, there exist orthogonal matrices $U$ (size $m \times m$) and $V$ (size $n \times n$) and a "diagonal" $m \times n$ matrix $\Sigma$ with non-negative entries $\sigma_1 \geq \sigma_2 \geq \cdots \geq 0$ on its main diagonal such that

$$ A = U \Sigma V^T. $$

The $\sigma_i$ are the singular values of $A$; the columns of $U$ are the left singular vectors; the columns of $V$ are the right singular vectors.

The three-step geometric story

Reading $A = U \Sigma V^T$ from right to left tells you exactly what $A$ does to any vector:

  1. $V^T$ rotates the input. Orthogonal matrices are rigid motions — distances and angles are preserved. The unit sphere stays a unit sphere; only the labels on the axes change.
  2. $\Sigma$ scales along the new axes. Direction $i$ is stretched by $\sigma_i$. A unit sphere is squashed into an ellipsoid whose semi-axis lengths are the singular values.
  3. $U$ rotates the result into the output space.

Every linear map — no matter how strange — is just rotate, scale, rotate. That is the entire content of the SVD.

v₁ v₂ Unit circle, with v₁, v₂ Vᵀ rotate e₁ e₂ Aligned to coord axes Σ scale σ₁ σ₂ Stretched to ellipse U rotate A · circle
SVD on the unit circle: $V^T$ rotates the input frame to coordinate axes, $\Sigma$ stretches by the singular values into an ellipse, $U$ rotates that ellipse into the output frame.

Connection to the spectral theorem

The SVD doesn't appear out of thin air — it's built on the spectral theorem. The matrices $A^T A$ (size $n \times n$) and $A A^T$ (size $m \times m$) are both symmetric and positive semi-definite. By the spectral theorem, each has an orthonormal eigenbasis with real non-negative eigenvalues. Those eigenvalues are the squared singular values; the eigenvectors of $A^T A$ are the right singular vectors (columns of $V$), and the eigenvectors of $A A^T$ are the left singular vectors (columns of $U$). The whole machinery is the spectral theorem applied twice and stitched together.

6. Low-rank approximation & applications

Every rectangular matrix decomposes, via the SVD, into a sum of rank-1 pieces:

$$ A = \sum_{i=1}^{r} \sigma_i \, u_i v_i^T $$

where $r$ is the rank and the singular values are ordered $\sigma_1 \geq \sigma_2 \geq \cdots \geq \sigma_r > 0$. Each term $\sigma_i u_i v_i^T$ is a rank-1 matrix, and the singular values give them a strict importance ranking.

Truncate the sum at the top $k$ terms:

$$ A_k = \sum_{i=1}^{k} \sigma_i \, u_i v_i^T. $$

The matrix $A_k$ is the best rank-$k$ approximation of $A$ — in a precise sense.

Eckart–Young theorem

Among all rank-$k$ matrices $B$, the choice $B = A_k$ minimizes both the Frobenius norm $\|A - B\|_F$ and the operator norm $\|A - B\|_2$. The truncation error is exactly

$$ \|A - A_k\|_F^2 = \sum_{i = k+1}^{r} \sigma_i^2. $$

If the singular values decay quickly, $A_k$ captures almost all of $A$ in a tiny fraction of the storage.

What this unlocks

Image compression. Treat a greyscale image as a matrix of pixel intensities. Compute its SVD. Keep the top $k$ singular values and reconstruct. For most natural images, the singular values drop off sharply — $k$ as small as a few percent of the dimensions can give visually indistinguishable results, at a fraction of the storage.

Principal Component Analysis (PCA). Center a data matrix $X$ (subtract the mean of each column). The right singular vectors of $X$ are the principal components — the directions of maximal variance — and the squared singular values are proportional to the variance explained. PCA is not a separate algorithm; it is the SVD applied to centered data.

Recommender systems. Represent the user-by-item ratings as a sparse matrix $R$. A low-rank SVD-style approximation $R \approx U_k \Sigma_k V_k^T$ assumes each user and each item lives in a low-dimensional "taste space"; the columns of $U_k$ become user profiles, the columns of $V_k$ become item profiles. Predicting an unseen rating is a dot product. Variants like funkSVD and matrix completion grew directly out of this view, and the Netflix Prize put it on the map.

Noise removal & numerical rank. Real data matrices are rank-deficient plus noise. Tiny singular values usually represent noise; truncating them is a clean way to recover the "true" rank.

7. A decision flowchart

Faced with a matrix and a goal, which decomposition? The flowchart below is the short answer. It captures the right reflex 95% of the time.

What are you trying to do? Pick the goal that matches. Solve Ax = b square A, many right- hand sides Least squares tall A, minimize ‖Ax − b‖ Symmetric matrix eigenstructure of A with Aᵀ = A Anything else LU A = LU factor once, reuse QR A = QR stable least squares Spectral A = QDQᵀ orthogonal eigenbasis SVD A = UΣVᵀ universal When in doubt, reach for SVD — it works on anything and tells you the most.
A working decision tree for the four main factorizations.
Note

Other decompositions exist — Cholesky (for symmetric positive-definite $A$, twice as fast as LU), Schur (any square matrix, triangularization over $\mathbb{C}$), Jordan (the canonical form when diagonalization fails), polar (every map is a rotation followed by a stretch). They're refinements; the four above cover the majority of working uses.

8. Common pitfalls

Eigendecomposition ≠ SVD

Eigenvalues can be negative or complex; singular values are always real and non-negative. Eigendecomposition requires square and diagonalizable; SVD has no such requirements. For a symmetric positive-definite matrix the two coincide, but in general they are different objects with different existence theorems.

Don't form $A^{-1}$ explicitly

If you find yourself writing inv(A) * b, stop. Explicit inverses are slower and less stable than factoring once and back-substituting. The whole engineering point of LU and QR is to never form $A^{-1}$.

Don't solve the normal equations directly

For least squares, $A^T A \hat{x} = A^T b$ is algebraically correct but numerically dangerous — the condition number of $A^T A$ is the square of that of $A$. Use QR (or SVD for the badly-conditioned case).

Cholesky needs positive-definiteness, not just symmetry

The $A = LL^T$ decomposition exists only when $A$ is symmetric and all its eigenvalues are strictly positive. Applying Cholesky to a merely-symmetric matrix can take the square root of a negative number midway through, breaking the algorithm — and signalling that your matrix isn't what you thought it was.

SVD uniqueness is partial

When the singular values are all distinct, the SVD is unique up to the sign of each singular vector pair. With repeated singular values, individual vectors within an eigenspace are not unique — only the subspace they span is. Don't rely on getting "the same" $U$ and $V$ across implementations when there are ties.

Classical Gram–Schmidt loses orthogonality

Mathematically correct, numerically unstable in finite precision — the columns of the computed $Q$ drift away from orthogonal. Use modified Gram–Schmidt, Householder reflections, or Givens rotations instead.

9. Worked examples

Each example below is a clean, by-hand factorization. The point is to internalize the shapes — which entries become which factors — so the algebra stops feeling magical.

Example 1 · LU factorization of a 2×2 matrix

Factor $A = \begin{pmatrix} 2 & 1 \\ 4 & 3 \end{pmatrix}$ as $LU$.

Step 1. Eliminate the entry below the first pivot. The pivot is $2$; the entry to kill is $4$. Multiplier $\ell_{21} = 4/2 = 2$.

$$ R_2 \leftarrow R_2 - 2 R_1: \quad \begin{pmatrix} 2 & 1 \\ 0 & 1 \end{pmatrix} $$

Step 2. Read off the factors. $U$ is what remains; $L$ stores the multipliers in the corresponding position with $1$s on the diagonal:

$$ L = \begin{pmatrix} 1 & 0 \\ 2 & 1 \end{pmatrix}, \quad U = \begin{pmatrix} 2 & 1 \\ 0 & 1 \end{pmatrix} $$

Check. $LU = \begin{pmatrix} 1 & 0 \\ 2 & 1 \end{pmatrix} \begin{pmatrix} 2 & 1 \\ 0 & 1 \end{pmatrix} = \begin{pmatrix} 2 & 1 \\ 4 & 3 \end{pmatrix}$ ✓

Example 2 · Using QR to solve a least-squares problem

Find the line $y = \alpha + \beta x$ best fitting the three points $(0, 1), (1, 1), (2, 4)$.

Step 1. Set up $Ax = b$:

$$ A = \begin{pmatrix} 1 & 0 \\ 1 & 1 \\ 1 & 2 \end{pmatrix}, \quad x = \begin{pmatrix} \alpha \\ \beta \end{pmatrix}, \quad b = \begin{pmatrix} 1 \\ 1 \\ 4 \end{pmatrix}. $$

Step 2. Apply Gram–Schmidt to the columns of $A$. Column 1 has norm $\sqrt{3}$, so $q_1 = \frac{1}{\sqrt{3}}(1, 1, 1)^T$. Column 2 minus its projection onto $q_1$ is $(0, 1, 2) - 1 \cdot (1, 1, 1) = (-1, 0, 1)$, with norm $\sqrt{2}$, so $q_2 = \frac{1}{\sqrt{2}}(-1, 0, 1)^T$.

$$ Q = \begin{pmatrix} 1/\sqrt 3 & -1/\sqrt 2 \\ 1/\sqrt 3 & 0 \\ 1/\sqrt 3 & 1/\sqrt 2 \end{pmatrix}, \quad R = \begin{pmatrix} \sqrt 3 & \sqrt 3 \\ 0 & \sqrt 2 \end{pmatrix}. $$

Step 3. Compute $Q^T b$ and solve $R \hat{x} = Q^T b$:

$$ Q^T b = \begin{pmatrix} 6/\sqrt 3 \\ 3/\sqrt 2 \end{pmatrix} = \begin{pmatrix} 2\sqrt 3 \\ \tfrac{3}{\sqrt 2} \end{pmatrix}. $$

Back-substitute: $\sqrt 2 \, \hat\beta = 3/\sqrt 2$ gives $\hat\beta = 3/2$. Then $\sqrt 3 \, \hat\alpha + \sqrt 3 \cdot (3/2) = 2\sqrt 3$ gives $\hat\alpha = 1/2$.

Best-fit line: $y = \tfrac{1}{2} + \tfrac{3}{2} x$.

Example 3 · Eigendecomposition of a triangular matrix

Diagonalize $A = \begin{pmatrix} 2 & 1 \\ 0 & 3 \end{pmatrix}$.

Step 1. Eigenvalues are the diagonal entries of a triangular matrix: $\lambda_1 = 2, \lambda_2 = 3$.

Step 2. Eigenvectors. For $\lambda = 2$: solve $(A - 2I) v = 0$:

$$ \begin{pmatrix} 0 & 1 \\ 0 & 1 \end{pmatrix} v = 0 \quad\Rightarrow\quad v_1 = \begin{pmatrix} 1 \\ 0 \end{pmatrix}. $$

For $\lambda = 3$: solve $(A - 3I) v = 0$:

$$ \begin{pmatrix} -1 & 1 \\ 0 & 0 \end{pmatrix} v = 0 \quad\Rightarrow\quad v_2 = \begin{pmatrix} 1 \\ 1 \end{pmatrix}. $$

Step 3. Assemble:

$$ P = \begin{pmatrix} 1 & 1 \\ 0 & 1 \end{pmatrix}, \quad D = \begin{pmatrix} 2 & 0 \\ 0 & 3 \end{pmatrix}, \quad P^{-1} = \begin{pmatrix} 1 & -1 \\ 0 & 1 \end{pmatrix}. $$

Check. $PDP^{-1} = \begin{pmatrix} 2 & 1 \\ 0 & 3 \end{pmatrix} = A$ ✓. Note: $A$ is not symmetric, and $P$ is not orthogonal — this is general eigendecomposition, not spectral.

Example 4 · SVD of a rank-1 matrix

Find the SVD of $A = \begin{pmatrix} 1 & 2 \\ 2 & 4 \end{pmatrix}$.

Step 1. Compute $A^T A = \begin{pmatrix} 5 & 10 \\ 10 & 20 \end{pmatrix}$. Its eigenvalues are $25$ and $0$ (the trace is $25$, the determinant is $0$).

Step 2. Singular values are the non-negative square roots of those eigenvalues: $\sigma_1 = 5$, $\sigma_2 = 0$. So $A$ has rank 1.

Step 3. Right singular vector $v_1$ is the unit eigenvector of $A^T A$ for $\lambda = 25$:

$$ v_1 = \tfrac{1}{\sqrt 5} \begin{pmatrix} 1 \\ 2 \end{pmatrix}. $$

Step 4. Left singular vector from $u_1 = \tfrac{1}{\sigma_1} A v_1$:

$$ u_1 = \tfrac{1}{5} \begin{pmatrix} 1 & 2 \\ 2 & 4 \end{pmatrix} \cdot \tfrac{1}{\sqrt 5}\begin{pmatrix} 1 \\ 2 \end{pmatrix} = \tfrac{1}{\sqrt 5}\begin{pmatrix} 1 \\ 2 \end{pmatrix}. $$

Result. $A = 5 \, u_1 v_1^T$ — a single rank-1 piece accounts for all of $A$. The Eckart–Young statement is trivial here: the best rank-1 approximation is $A$ itself.

Example 5 · Spectral decomposition of a 2×2 symmetric matrix

Diagonalize $A = \begin{pmatrix} 3 & 1 \\ 1 & 3 \end{pmatrix}$ orthogonally.

Step 1. Eigenvalues: $\det(A - \lambda I) = (3-\lambda)^2 - 1 = 0$ gives $\lambda = 4, 2$.

Step 2. Eigenvectors. For $\lambda = 4$: $(A - 4I)v = 0$ gives $v = (1, 1)^T$. Normalize: $q_1 = \tfrac{1}{\sqrt 2}(1, 1)^T$.

For $\lambda = 2$: $(A - 2I)v = 0$ gives $v = (1, -1)^T$. Normalize: $q_2 = \tfrac{1}{\sqrt 2}(1, -1)^T$.

Step 3. Verify orthogonality: $q_1 \cdot q_2 = \tfrac{1}{2}(1 \cdot 1 + 1 \cdot (-1)) = 0$ ✓ (as the spectral theorem promised).

Step 4. Assemble:

$$ Q = \tfrac{1}{\sqrt 2} \begin{pmatrix} 1 & 1 \\ 1 & -1 \end{pmatrix}, \quad D = \begin{pmatrix} 4 & 0 \\ 0 & 2 \end{pmatrix}, \quad A = Q D Q^T. $$

Because $Q$ is orthogonal, $Q^{-1}$ is just $Q^T$ — no inversion needed. That is the convenience the spectral theorem buys.

Sources & further reading

The content above is synthesized from established linear-algebra texts and lecture series. Where this page paints in broad strokes, the sources go deep.

Test your understanding

A quiz that builds from easy to hard. Pick an answer to get instant feedback and a worked explanation. Your progress is saved in this browser — come back anytime to continue.

Question 1 of 22
0 correct