1. What "linear" means
A function $T : \mathbb{R}^n \to \mathbb{R}^m$ is linear if it respects both vector addition and scalar multiplication:
$$ T(\vec{u} + \vec{v}) = T(\vec{u}) + T(\vec{v}) \qquad \text{and} \qquad T(c\,\vec{v}) = c\,T(\vec{v}) $$for all vectors $\vec{u}, \vec{v}$ and all scalars $c$.
The two conditions are usually rolled into one: $T$ is linear iff $T(a\vec{u} + b\vec{v}) = a\,T(\vec{u}) + b\,T(\vec{v})$. In words — you can distribute $T$ across linear combinations.
The picture is even better than the algebra. A linear transformation is anything you can do to the plane that keeps three rules in force:
- Grid lines stay straight. No bending, no curving.
- Grid lines stay parallel and evenly spaced. A grid maps to a (possibly skewed, scaled, rotated) grid.
- The origin stays fixed. $T(\vec{0}) = \vec{0}$, always.
Anything else — folding paper, stretching one direction nonlinearly, shifting everything to the right by 3 — breaks the rules. The grid has to flex as one piece, anchored at the origin.
Original grid
After applying $T$ (a shear)
Don't picture a function as "a formula that eats numbers." Picture it as a motion of the plane. Pick up the entire transparent grid sheet, deform it (without tearing, folding, or unpinning the origin) until your two basis arrows point wherever you like — that motion is your linear transformation. Everything else in this topic is bookkeeping for that picture.
2. Every matrix gives a linear transformation
Here is the bridge between matrices and motion. Given an $m \times n$ matrix $A$, define
$$ T_A(\vec{x}) = A\vec{x}. $$This is a function from $\mathbb{R}^n$ to $\mathbb{R}^m$, and it is automatically linear — matrix multiplication distributes over vector addition and scalar multiplication:
$$ A(\vec{u} + \vec{v}) = A\vec{u} + A\vec{v}, \qquad A(c\vec{v}) = c(A\vec{v}). $$The converse — every linear transformation comes from a matrix — is the more surprising direction. It works because a linear $T$ is determined by very little information.
Write any $\vec{x} \in \mathbb{R}^n$ in standard-basis coordinates: $\vec{x} = x_1 \vec{e}_1 + x_2 \vec{e}_2 + \cdots + x_n \vec{e}_n$. Apply $T$ and unpack with linearity:
$$ T(\vec{x}) = x_1 T(\vec{e}_1) + x_2 T(\vec{e}_2) + \cdots + x_n T(\vec{e}_n). $$So $T$ is completely determined by what it does to the $n$ basis vectors. Stack those $n$ output vectors as columns and you have the matrix:
$$ A = \Big[\; T(\vec{e}_1)\;\Big|\;T(\vec{e}_2)\;\Big|\;\cdots\;\Big|\;T(\vec{e}_n)\;\Big]. $$And then $T(\vec{x}) = A\vec{x}$, by construction.
Linear transformations and matrices are the same objects spoken in two dialects. Matrices are how you compute; transformations are what is happening.
3. The standard 2D zoo
Four 2-D transformations show up so often that you should recognize their matrices at a glance. Each does exactly one thing to the unit square.
Rotation by angle $\theta$ (counter-clockwise)
$$ R_\theta = \begin{pmatrix} \cos\theta & -\sin\theta \\ \sin\theta & \phantom{-}\cos\theta \end{pmatrix} $$The columns are where $\vec{e}_1$ and $\vec{e}_2$ land: $\vec{e}_1 = (1,0)$ rotates to $(\cos\theta, \sin\theta)$, $\vec{e}_2 = (0,1)$ rotates to $(-\sin\theta, \cos\theta)$. The whole plane spins rigidly about the origin.
Reflection across the x-axis
$$ F_x = \begin{pmatrix} 1 & \phantom{-}0 \\ 0 & -1 \end{pmatrix} $$$\vec{e}_1$ doesn't move; $\vec{e}_2$ flips down. Reflection across the y-axis is the mirror version: $\operatorname{diag}(-1, 1)$.
Uniform scaling by factor $s$
$$ S_s = \begin{pmatrix} s & 0 \\ 0 & s \end{pmatrix} = s\,I $$Every vector is stretched (or shrunk) by the same factor. Non-uniform scaling $\operatorname{diag}(s_1, s_2)$ stretches the two axes independently — useful when one direction matters more than the other.
Horizontal shear by $k$
$$ H_k = \begin{pmatrix} 1 & k \\ 0 & 1 \end{pmatrix} $$$\vec{e}_1$ stays put; $\vec{e}_2$ slides $k$ units to the right (and stays at height $1$). The grid stays a grid, but the right-angle between the axes is broken — that's the diagram you saw in §1.
| Transformation | Matrix | Where $\vec{e}_1$ lands | Where $\vec{e}_2$ lands |
|---|---|---|---|
| Rotation by $\theta$ | $R_\theta$ | $(\cos\theta, \sin\theta)$ | $(-\sin\theta, \cos\theta)$ |
| Reflection (x-axis) | $F_x$ | $(1, 0)$ | $(0, -1)$ |
| Scaling by $s$ | $S_s$ | $(s, 0)$ | $(0, s)$ |
| Horizontal shear by $k$ | $H_k$ | $(1, 0)$ | $(k, 1)$ |
Every entry in those matrices is just the coordinates of $T(\vec{e}_1)$ and $T(\vec{e}_2)$. Once you internalize that, you stop memorizing the matrices — you derive them on the fly.
4. Composition is matrix multiplication
Apply transformation $A$, then transformation $B$. What's the combined effect?
$$ (B \circ A)(\vec{x}) = B(A\vec{x}) = B(A\vec{x}) = (BA)\,\vec{x}. $$The combined transformation is given by the matrix product $BA$ — with $B$ on the left, because $A$ acts first. This is the entire reason matrix multiplication is defined the way it is. Someone could have defined $A \times B$ to mean something else; nobody did, because then composing transformations would not match composing matrices, and the whole edifice would crack.
Concretely, let's compose a $90°$ rotation with a horizontal reflection in two orders:
$$ R_{90°} = \begin{pmatrix} 0 & -1 \\ 1 & \phantom{-}0 \end{pmatrix}, \qquad F_x = \begin{pmatrix} 1 & \phantom{-}0 \\ 0 & -1 \end{pmatrix}. $$Rotate first, then reflect ($F_x \circ R_{90°}$):
$$ F_x R_{90°} = \begin{pmatrix} 1 & 0 \\ 0 & -1 \end{pmatrix} \begin{pmatrix} 0 & -1 \\ 1 & \phantom{-}0 \end{pmatrix} = \begin{pmatrix} \phantom{-}0 & -1 \\ -1 & \phantom{-}0 \end{pmatrix}. $$Reflect first, then rotate ($R_{90°} \circ F_x$):
$$ R_{90°} F_x = \begin{pmatrix} 0 & -1 \\ 1 & \phantom{-}0 \end{pmatrix} \begin{pmatrix} 1 & \phantom{-}0 \\ 0 & -1 \end{pmatrix} = \begin{pmatrix} 0 & 1 \\ 1 & 0 \end{pmatrix}. $$Different matrices. Different transformations. That's matrix multiplication telling you, correctly, that the order of geometric operations matters.
In $BA\vec{x}$, the matrix nearest the vector is the one applied first. "First $A$, then $B$" reads from right to left, like function composition. Drilling this until it feels automatic will save you hours of debugging later.
5. The columns tell you everything
If you take only one operational lesson from this page, take this one.
To find the matrix of a linear transformation $T$, apply $T$ to each standard basis vector and write the results as the columns.
Why it works was already laid out in §2: a linear transformation is determined by where it sends the basis, and those images are the columns of the matrix.
So you almost never need to compute matrix entries directly — you compute basis images.
Example: rotation by $90°$ from scratch
You don't have to remember $R_\theta$. Just rotate $\vec{e}_1$ and $\vec{e}_2$ by hand.
- $\vec{e}_1 = (1,0)$ rotated $90°$ counter-clockwise lands at $(0,1)$.
- $\vec{e}_2 = (0,1)$ rotated $90°$ counter-clockwise lands at $(-1,0)$.
Stack as columns:
$$ R_{90°} = \begin{pmatrix} 0 & -1 \\ 1 & \phantom{-}0 \end{pmatrix}. $$You just rederived the rotation matrix. No formula required.
Example: projection onto the x-axis
What's the matrix of the transformation that drops every vector vertically onto the x-axis?
- $\vec{e}_1 = (1,0)$ is already on the x-axis. Image: $(1, 0)$.
- $\vec{e}_2 = (0,1)$ collapses to the origin. Image: $(0, 0)$.
Notice $P$ is singular — it squashes the plane down to a line, so it cannot be inverted. The matrix knew that before you did.
6. Kernel, image, and rank–nullity
Once you have a linear transformation $T : \mathbb{R}^n \to \mathbb{R}^m$, two subspaces deserve names. One lives in the input space and asks "what gets crushed?"; the other lives in the output space and asks "what can be reached?". Together they obey a conservation law that ties dimensions on both sides.
The kernel of $T$ is the set of input vectors that $T$ sends to the zero vector:
$$ \ker(T) = \{\,\vec{x} \in \mathbb{R}^n : T(\vec{x}) = \vec{0}\,\}. $$For a matrix transformation $T(\vec{x}) = A\vec{x}$, this is exactly the null space of $A$: the solution set of $A\vec{x} = \vec{0}$. It is always a subspace of the domain.
The image of $T$ is the set of output vectors $T$ can actually produce:
$$ \operatorname{im}(T) = \{\,T(\vec{x}) : \vec{x} \in \mathbb{R}^n\,\}. $$For $T(\vec{x}) = A\vec{x}$, this is the column space of $A$ — the span of the columns. It is always a subspace of the codomain.
The two dimensions get their own names:
- Rank of $T$ is $\dim \operatorname{im}(T)$ — how many independent directions survive on the output side.
- Nullity of $T$ is $\dim \ker(T)$ — how many independent directions get collapsed to zero.
The rank also equals the number of linearly independent columns of the matrix — and, by a deep symmetry, the number of linearly independent rows.
For any linear $T : \mathbb{R}^n \to \mathbb{R}^m$,
$$ \operatorname{rank}(T) + \operatorname{nullity}(T) = n, $$where $n$ is the dimension of the domain. Every input dimension is accounted for: either it stretches out to contribute a new direction of output (rank), or it gets squashed to zero (nullity).
Reading rank and nullity off the standard zoo
Look at the same 2-D maps from §3 through this new lens.
| Transformation | Matrix | $\dim \ker$ | $\operatorname{rank}$ | Check $n$ |
|---|---|---|---|---|
| Rotation by $\theta$ | $R_\theta$ | $0$ | $2$ | $0 + 2 = 2$ ✓ |
| Reflection (x-axis) | $F_x$ | $0$ | $2$ | $0 + 2 = 2$ ✓ |
| Uniform scaling by $s \neq 0$ | $sI$ | $0$ | $2$ | $0 + 2 = 2$ ✓ |
| Projection onto x-axis | $\begin{pmatrix} 1 & 0 \\ 0 & 0 \end{pmatrix}$ | $1$ | $1$ | $1 + 1 = 2$ ✓ |
| Zero map | $\begin{pmatrix} 0 & 0 \\ 0 & 0 \end{pmatrix}$ | $2$ | $0$ | $2 + 0 = 2$ ✓ |
The projection is the interesting one: it collapses the y-direction (nullity 1 — the entire y-axis maps to $\vec{0}$) and keeps the x-direction alive (rank 1 — the image is the x-axis). The accounting balances.
A square linear transformation $T : \mathbb{R}^n \to \mathbb{R}^n$ is invertible iff $\ker(T) = \{\vec{0}\}$ iff $\operatorname{rank}(T) = n$ iff its matrix has nonzero determinant. The three conditions are restatements of one another. Anything that crushes a direction (any vector in the kernel besides $\vec{0}$) loses information you can never recover.
A worked computation
Take $A = \begin{pmatrix} 1 & 2 \\ 2 & 4 \end{pmatrix}$. The two columns are $\begin{pmatrix} 1 \\ 2 \end{pmatrix}$ and $\begin{pmatrix} 2 \\ 4 \end{pmatrix} = 2\begin{pmatrix} 1 \\ 2 \end{pmatrix}$ — the second is twice the first. The column space is a one-dimensional line, so $\operatorname{rank}(A) = 1$.
For the kernel, solve $A\vec{x} = \vec{0}$: both rows give $x_1 + 2x_2 = 0$, so $\vec{x} = t\begin{pmatrix} -2 \\ \phantom{-}1 \end{pmatrix}$ for any $t \in \mathbb{R}$. That's a one-dimensional line of solutions, so $\operatorname{nullity}(A) = 1$.
Check: $\operatorname{rank} + \operatorname{nullity} = 1 + 1 = 2 = n$. ✓