Inner Product Spaces — Linear Algebra

What you'll leave with

The three axioms that define an inner product, and why each one is non-negotiable.
The induced norm and distance — geometry that falls out of the inner product for free.
A proof of the Cauchy–Schwarz inequality, and what it really says about "angle."
Orthogonal projection onto a subspace as the unique best approximation.
Gram–Schmidt: a recipe that turns any basis into an orthonormal one, worked end-to-end.
The function-space example $\langle f, g \rangle = \int_a^b f g \, dx$ — the bridge to Fourier analysis.

1. Why generalize the dot product?

In $\mathbb{R}^n$ the dot product $\mathbf{u} \cdot \mathbf{v} = u_1 v_1 + u_2 v_2 + \cdots + u_n v_n$ does two jobs at once. Numerically, it's a single scalar you can compute coordinate by coordinate. Geometrically, it tells you how much two vectors point the same way — it gives you length ($\|\mathbf{v}\|^2 = \mathbf{v} \cdot \mathbf{v}$), angle ($\cos\theta = \mathbf{u}\cdot\mathbf{v}/\|\mathbf{u}\|\|\mathbf{v}\|$), and orthogonality (the dot product is zero).

That second job — turning algebra into geometry — is what we want to keep. So we ask: what properties of the dot product are actually doing the work? Strip the coordinates away, write down only the abstract rules, and you have the definition of an inner product. Any vector space that admits one of these comes with its own geometry: lengths, angles, perpendicularity, the Pythagorean theorem. All for the price of three axioms.

The payoff is that "vector space" no longer means "arrows in $\mathbb{R}^n$." Polynomials, continuous functions, infinite sequences, random variables — all of them carry useful inner products, and the same theorems we'll prove here work in every one of them.

2. The inner product axioms

Let $V$ be a vector space over $\mathbb{R}$ or $\mathbb{C}$. An inner product is a function $\langle \cdot, \cdot \rangle : V \times V \to \mathbb{F}$ (where $\mathbb{F}$ is the underlying field) that satisfies three rules.

Conjugate symmetry. $\langle \mathbf{u}, \mathbf{v} \rangle = \overline{\langle \mathbf{v}, \mathbf{u} \rangle}$ for all $\mathbf{u}, \mathbf{v} \in V$. Over $\mathbb{R}$ the conjugate does nothing, so this is plain symmetry: $\langle \mathbf{u}, \mathbf{v} \rangle = \langle \mathbf{v}, \mathbf{u} \rangle$.
Linearity in the first argument. $\langle a\mathbf{u} + b\mathbf{w}, \mathbf{v} \rangle = a\langle \mathbf{u}, \mathbf{v} \rangle + b\langle \mathbf{w}, \mathbf{v} \rangle$ for all scalars $a, b$ and vectors $\mathbf{u}, \mathbf{v}, \mathbf{w}$.
Positive-definiteness. $\langle \mathbf{v}, \mathbf{v} \rangle \geq 0$ for every $\mathbf{v}$, with equality if and only if $\mathbf{v} = \mathbf{0}$.

A vector space equipped with an inner product is called an inner product space. Combining axioms 1 and 2 gives conjugate-linearity in the second argument:

$$ \langle \mathbf{u}, a\mathbf{v} + b\mathbf{w} \rangle = \bar{a}\langle \mathbf{u}, \mathbf{v} \rangle + \bar{b}\langle \mathbf{u}, \mathbf{w} \rangle. $$

Over $\mathbb{R}$ those conjugates vanish, and the inner product is linear in both slots — what we call a bilinear form. We'll mostly work over $\mathbb{R}$; the complex case adds only the conjugates.

Note · why the conjugate?

Without conjugation, positive-definiteness would fail over $\mathbb{C}$. Try it: if $\langle i\mathbf{v}, i\mathbf{v} \rangle$ were $i^2 \langle \mathbf{v}, \mathbf{v} \rangle = -\langle \mathbf{v}, \mathbf{v} \rangle$, then a nonzero vector would have negative "length squared." Conjugate-linearity in the second slot rescues the inequality.

The standard examples

Space	Inner product
$\mathbb{R}^n$	$\langle \mathbf{u}, \mathbf{v} \rangle = \sum_{i=1}^{n} u_i v_i$ (the dot product)
$\mathbb{C}^n$	$\langle \mathbf{u}, \mathbf{v} \rangle = \sum_{i=1}^{n} u_i \overline{v_i}$
$C[a,b]$ (continuous real functions)	$\langle f, g \rangle = \int_a^b f(x) g(x)\, dx$
$\mathbb{R}^n$ with weights $w_i > 0$	$\langle \mathbf{u}, \mathbf{v} \rangle = \sum_i w_i u_i v_i$

All four are honest inner products — you can check each axiom directly. The same theorems we prove next will hold for every entry of this table simultaneously.

3. Induced norm and distance

Once you have an inner product, length comes for free. The induced norm is

$$ \|\mathbf{v}\| = \sqrt{\langle \mathbf{v}, \mathbf{v} \rangle}. $$

This is well-defined because positive-definiteness guarantees the square root is of a non-negative real number. The norm satisfies the three norm axioms — positivity, absolute homogeneity ($\|c\mathbf{v}\| = |c|\,\|\mathbf{v}\|$), and the triangle inequality $\|\mathbf{u}+\mathbf{v}\| \leq \|\mathbf{u}\| + \|\mathbf{v}\|$. The triangle inequality is the only one that takes work; it follows from Cauchy–Schwarz, which we'll prove in a moment.

Distance follows immediately: $d(\mathbf{u}, \mathbf{v}) = \|\mathbf{u} - \mathbf{v}\|$. So an inner product gives you a norm, and a norm gives you a metric. Geometry slides out of algebra.

Why this matters

The same formula $\|\mathbf{v}\| = \sqrt{\langle \mathbf{v}, \mathbf{v} \rangle}$ that computes Euclidean length in $\mathbb{R}^n$ also computes the root-mean-square size of a function in $C[a,b]$ — namely $\sqrt{\int_a^b f(x)^2 dx}$. One definition, many incarnations.

4. The Cauchy–Schwarz inequality

The single most important inequality in the whole subject. It says: in any inner product space,

$$ |\langle \mathbf{u}, \mathbf{v} \rangle| \leq \|\mathbf{u}\|\,\|\mathbf{v}\|, $$

with equality if and only if $\mathbf{u}$ and $\mathbf{v}$ are linearly dependent (one is a scalar multiple of the other).

The proof — and why it's clever

If $\mathbf{v} = \mathbf{0}$ both sides are zero and we're done. Otherwise, pick any scalar $t$ and consider the vector $\mathbf{u} - t\mathbf{v}$. By positive-definiteness,

$$ 0 \leq \|\mathbf{u} - t\mathbf{v}\|^2 = \langle \mathbf{u} - t\mathbf{v},\, \mathbf{u} - t\mathbf{v} \rangle. $$

Expand using linearity (over $\mathbb{R}$ for clarity):

$$ \|\mathbf{u} - t\mathbf{v}\|^2 = \|\mathbf{u}\|^2 - 2t\langle \mathbf{u}, \mathbf{v} \rangle + t^2 \|\mathbf{v}\|^2. $$

This is a non-negative quadratic in $t$. The trick is to pick $t$ to make the right-hand side as small as possible. Setting the derivative to zero gives the minimizer $t^* = \langle \mathbf{u}, \mathbf{v} \rangle / \|\mathbf{v}\|^2$. Plug it back in:

$$ 0 \leq \|\mathbf{u}\|^2 - \frac{\langle \mathbf{u}, \mathbf{v} \rangle^2}{\|\mathbf{v}\|^2}. $$

Multiply through by $\|\mathbf{v}\|^2$ and take square roots:

$$ |\langle \mathbf{u}, \mathbf{v} \rangle| \leq \|\mathbf{u}\|\,\|\mathbf{v}\|. \qquad\blacksquare $$

Equality forces $\|\mathbf{u} - t^*\mathbf{v}\|^2 = 0$, which by positive-definiteness means $\mathbf{u} = t^*\mathbf{v}$ — linear dependence.

What it really says

Rearrange the inequality:

$$ -1 \leq \frac{\langle \mathbf{u}, \mathbf{v} \rangle}{\|\mathbf{u}\|\,\|\mathbf{v}\|} \leq 1. $$

That ratio lives in $[-1, 1]$, exactly the range of cosine. So we are allowed to define

$$ \cos\theta = \frac{\langle \mathbf{u}, \mathbf{v} \rangle}{\|\mathbf{u}\|\,\|\mathbf{v}\|}, $$

and call $\theta \in [0, \pi]$ the angle between $\mathbf{u}$ and $\mathbf{v}$. Cauchy–Schwarz is what guarantees this definition is sensible in any inner product space, not just $\mathbb{R}^n$. Two polynomials, two random variables, two square-integrable functions — all have angles between them.

Triangle inequality, in one line

$\|\mathbf{u}+\mathbf{v}\|^2 = \|\mathbf{u}\|^2 + 2\langle \mathbf{u}, \mathbf{v} \rangle + \|\mathbf{v}\|^2 \leq \|\mathbf{u}\|^2 + 2\|\mathbf{u}\|\|\mathbf{v}\| + \|\mathbf{v}\|^2 = (\|\mathbf{u}\| + \|\mathbf{v}\|)^2$. Take square roots. Cauchy–Schwarz did all the work.

5. Orthogonality and Pythagoras

Orthogonal

Two vectors $\mathbf{u}, \mathbf{v}$ in an inner product space are orthogonal, written $\mathbf{u} \perp \mathbf{v}$, if $\langle \mathbf{u}, \mathbf{v} \rangle = 0$. By convention the zero vector is orthogonal to everything.

Orthogonality is no longer a fact about right angles in pictures — it's an equation. And it gives us the Pythagorean theorem essentially for free.

If $\mathbf{u} \perp \mathbf{v}$, then

$$ \|\mathbf{u} + \mathbf{v}\|^2 = \langle \mathbf{u} + \mathbf{v},\, \mathbf{u} + \mathbf{v} \rangle = \|\mathbf{u}\|^2 + 2\langle \mathbf{u}, \mathbf{v} \rangle + \|\mathbf{v}\|^2 = \|\mathbf{u}\|^2 + \|\mathbf{v}\|^2. $$

That's the Pythagorean theorem in the inner-product-space form. Three lines of algebra, applies to function spaces, complex spaces, and weighted spaces alike. (Over $\mathbb{C}$ you need both $\langle \mathbf{u}, \mathbf{v} \rangle = 0$ and $\langle \mathbf{v}, \mathbf{u} \rangle = 0$, but the first implies the second by conjugate symmetry.)

Orthogonal sets are automatically independent

If $\mathbf{v}_1, \ldots, \mathbf{v}_k$ are nonzero and pairwise orthogonal, they are linearly independent. Proof: suppose $c_1 \mathbf{v}_1 + \cdots + c_k \mathbf{v}_k = \mathbf{0}$. Take the inner product of both sides with $\mathbf{v}_i$. All cross terms vanish, leaving $c_i \|\mathbf{v}_i\|^2 = 0$, so $c_i = 0$ for each $i$.

That's a lot of work avoided. In a generic basis, checking independence means solving a system; in an orthogonal basis, it's already true and you can read coefficients off by inner products.

6. Orthogonal projection onto a subspace

Here's the question that motivates the next two sections: given a vector $\mathbf{v}$ and a subspace $W$, what is the vector in $W$ that's closest to $\mathbf{v}$? Closest in the sense of the induced distance $\|\mathbf{v} - \mathbf{w}\|$.

The answer is the orthogonal projection of $\mathbf{v}$ onto $W$, denoted $\operatorname{proj}_W(\mathbf{v})$. It is characterized by a single geometric condition:

$$ \mathbf{v} - \operatorname{proj}_W(\mathbf{v}) \perp W $$

— the "error" is perpendicular to the whole subspace.

The formula, when you have an orthogonal basis

Suppose $\{\mathbf{e}_1, \ldots, \mathbf{e}_k\}$ is an orthogonal basis of $W$. Then

$$ \operatorname{proj}_W(\mathbf{v}) = \sum_{i=1}^{k} \frac{\langle \mathbf{v}, \mathbf{e}_i \rangle}{\langle \mathbf{e}_i, \mathbf{e}_i \rangle}\, \mathbf{e}_i. $$

If the basis is additionally orthonormal (each $\|\mathbf{e}_i\| = 1$), the denominators are all 1 and the formula collapses to the bone-clean

$$ \operatorname{proj}_W(\mathbf{v}) = \sum_{i=1}^{k} \langle \mathbf{v}, \mathbf{e}_i \rangle\, \mathbf{e}_i. $$

The coefficients $\langle \mathbf{v}, \mathbf{e}_i \rangle$ are called the Fourier coefficients of $\mathbf{v}$ relative to the basis. Read that name twice — the connection to Fourier series isn't a metaphor, it's the same formula in a different inner product space.

Best approximation: the punchline

For any $\mathbf{w} \in W$, write $\mathbf{v} - \mathbf{w} = \bigl(\mathbf{v} - \operatorname{proj}_W(\mathbf{v})\bigr) + \bigl(\operatorname{proj}_W(\mathbf{v}) - \mathbf{w}\bigr)$. The first piece is orthogonal to $W$; the second piece lives in $W$. By Pythagoras,

$$ \|\mathbf{v} - \mathbf{w}\|^2 = \|\mathbf{v} - \operatorname{proj}_W(\mathbf{v})\|^2 + \|\operatorname{proj}_W(\mathbf{v}) - \mathbf{w}\|^2. $$

The right-hand side is minimized when the second term is zero, i.e., when $\mathbf{w} = \operatorname{proj}_W(\mathbf{v})$. So the orthogonal projection is provably the unique closest point — that's the best approximation theorem, and it underwrites everything from least-squares regression to truncated Fourier series.

Where you've already met this

Least-squares regression: project the data vector onto the column space of the design matrix. The "fit" is the projection; the "residual" is the perpendicular piece. Fourier series: project a function onto the span of $\{1, \cos x, \sin x, \cos 2x, \ldots\}$ in $C[-\pi, \pi]$. Same theorem, different inner product.

7. Gram–Schmidt orthonormalization

The projection formula is beautiful — but it demands an orthogonal basis. What if all you have is some ordinary, ungainly basis? The Gram–Schmidt process is a recipe that takes any basis and produces an orthonormal one spanning the same subspace, vector by vector.

Given linearly independent $\mathbf{v}_1, \mathbf{v}_2, \ldots, \mathbf{v}_n$, produce $\mathbf{e}_1, \mathbf{e}_2, \ldots, \mathbf{e}_n$ via:

$$ \begin{aligned} \mathbf{u}_1 &= \mathbf{v}_1, &\mathbf{e}_1 &= \frac{\mathbf{u}_1}{\|\mathbf{u}_1\|}, \\ \mathbf{u}_2 &= \mathbf{v}_2 - \langle \mathbf{v}_2, \mathbf{e}_1 \rangle \mathbf{e}_1, &\mathbf{e}_2 &= \frac{\mathbf{u}_2}{\|\mathbf{u}_2\|}, \\ \mathbf{u}_3 &= \mathbf{v}_3 - \langle \mathbf{v}_3, \mathbf{e}_1 \rangle \mathbf{e}_1 - \langle \mathbf{v}_3, \mathbf{e}_2 \rangle \mathbf{e}_2, &\mathbf{e}_3 &= \frac{\mathbf{u}_3}{\|\mathbf{u}_3\|}, \\ &\vdots &&\vdots \end{aligned} $$

In words: at each step, take the next basis vector, subtract off its projection onto everything you've already orthonormalized, and normalize what's left. The geometry is exactly the picture below.

Gram–Schmidt sweeps from left to right. At each step the projection onto already-chosen directions is subtracted off; the perpendicular leftover is normalized to give the next orthonormal vector.

A complete worked 3-vector example

Apply Gram–Schmidt in $\mathbb{R}^3$ (standard dot product) to

$$ \mathbf{v}_1 = \begin{pmatrix} 1 \\ 1 \\ 0 \end{pmatrix},\quad \mathbf{v}_2 = \begin{pmatrix} 1 \\ 0 \\ 1 \end{pmatrix},\quad \mathbf{v}_3 = \begin{pmatrix} 0 \\ 1 \\ 1 \end{pmatrix}. $$

Step 1. Normalize $\mathbf{v}_1$. We have $\|\mathbf{v}_1\| = \sqrt{1+1+0} = \sqrt{2}$, so

$$ \mathbf{e}_1 = \frac{1}{\sqrt{2}}\begin{pmatrix} 1 \\ 1 \\ 0 \end{pmatrix}. $$

Step 2. Subtract the component of $\mathbf{v}_2$ along $\mathbf{e}_1$. The inner product is

$$ \langle \mathbf{v}_2, \mathbf{e}_1 \rangle = \tfrac{1}{\sqrt{2}}(1\cdot 1 + 0\cdot 1 + 1\cdot 0) = \tfrac{1}{\sqrt{2}}. $$ $$ \mathbf{u}_2 = \mathbf{v}_2 - \tfrac{1}{\sqrt{2}} \cdot \mathbf{e}_1 = \begin{pmatrix} 1 \\ 0 \\ 1 \end{pmatrix} - \tfrac{1}{2}\begin{pmatrix} 1 \\ 1 \\ 0 \end{pmatrix} = \begin{pmatrix} 1/2 \\ -1/2 \\ 1 \end{pmatrix}. $$

Now $\|\mathbf{u}_2\|^2 = \tfrac{1}{4} + \tfrac{1}{4} + 1 = \tfrac{3}{2}$, so $\|\mathbf{u}_2\| = \sqrt{3/2} = \sqrt{6}/2$. Normalizing:

$$ \mathbf{e}_2 = \frac{1}{\sqrt{6}}\begin{pmatrix} 1 \\ -1 \\ 2 \end{pmatrix}. $$

Step 3. Subtract from $\mathbf{v}_3$ its components along both $\mathbf{e}_1$ and $\mathbf{e}_2$:

$$ \langle \mathbf{v}_3, \mathbf{e}_1 \rangle = \tfrac{1}{\sqrt{2}}(0 + 1 + 0) = \tfrac{1}{\sqrt{2}}, \quad \langle \mathbf{v}_3, \mathbf{e}_2 \rangle = \tfrac{1}{\sqrt{6}}(0 - 1 + 2) = \tfrac{1}{\sqrt{6}}. $$ $$ \mathbf{u}_3 = \mathbf{v}_3 - \tfrac{1}{\sqrt{2}}\mathbf{e}_1 - \tfrac{1}{\sqrt{6}}\mathbf{e}_2 = \begin{pmatrix} 0 \\ 1 \\ 1 \end{pmatrix} - \tfrac{1}{2}\begin{pmatrix} 1 \\ 1 \\ 0 \end{pmatrix} - \tfrac{1}{6}\begin{pmatrix} 1 \\ -1 \\ 2 \end{pmatrix}. $$

Combine the last two corrections coordinate by coordinate. The $x$-coordinate is $0 - \tfrac{1}{2} - \tfrac{1}{6} = -\tfrac{2}{3}$. The $y$-coordinate is $1 - \tfrac{1}{2} + \tfrac{1}{6} = \tfrac{2}{3}$. The $z$-coordinate is $1 - 0 - \tfrac{1}{3} = \tfrac{2}{3}$. So

$$ \mathbf{u}_3 = \begin{pmatrix} -2/3 \\ 2/3 \\ 2/3 \end{pmatrix}, \quad \|\mathbf{u}_3\| = \tfrac{2}{3}\sqrt{3} = \tfrac{2}{\sqrt{3}}. $$ $$ \mathbf{e}_3 = \frac{1}{\sqrt{3}}\begin{pmatrix} -1 \\ 1 \\ 1 \end{pmatrix}. $$

Sanity check. Verify orthogonality: $\langle \mathbf{e}_1, \mathbf{e}_2 \rangle = \tfrac{1}{\sqrt{12}}(1 - 1 + 0) = 0$ ✓, $\langle \mathbf{e}_1, \mathbf{e}_3 \rangle = \tfrac{1}{\sqrt{6}}(-1 + 1 + 0) = 0$ ✓, $\langle \mathbf{e}_2, \mathbf{e}_3 \rangle = \tfrac{1}{\sqrt{18}}(-1 - 1 + 2) = 0$ ✓. Each is a unit vector by construction. We've produced an orthonormal basis of $\mathbb{R}^3$.

Watch out · numerical Gram–Schmidt

Classical Gram–Schmidt as written is numerically unstable: rounding errors in early steps amplify later, and the result is often noticeably non-orthogonal. In code, use modified Gram–Schmidt (subtract projections one at a time, updating after each) or, better, a Householder QR. The math is the same; the floating-point behavior is not.

8. Beyond $\mathbb{R}^n$: function spaces

The most consequential inner product in applied mathematics doesn't act on tuples of numbers — it acts on functions. On the space of continuous real functions on $[a, b]$, define

$$ \langle f, g \rangle = \int_a^b f(x) g(x)\, dx. $$

Check the axioms: symmetry is obvious (multiplication of real numbers commutes), linearity follows from linearity of the integral, and positive-definiteness is $\int_a^b f(x)^2 dx \geq 0$ with equality only when $f \equiv 0$ (for continuous $f$).

Everything we've built carries over. The norm is $\|f\| = \sqrt{\int_a^b f^2 dx}$ — the root-mean-square size. Two functions are orthogonal when $\int_a^b f g\, dx = 0$. Cauchy–Schwarz becomes

$$ \left|\int_a^b f(x) g(x)\, dx\right| \leq \sqrt{\int_a^b f^2\, dx} \cdot \sqrt{\int_a^b g^2\, dx}. $$

Projection onto a finite-dimensional subspace of functions is still the best approximation in the RMS sense. And the orthonormal basis you choose dictates which family of approximations you get:

Orthonormal basis	What you build
$\{1, \cos nx, \sin nx\}$ on $[-\pi, \pi]$	Fourier series
Apply Gram–Schmidt to $\{1, x, x^2, \ldots\}$ on $[-1, 1]$	Legendre polynomials
Apply Gram–Schmidt with weight $e^{-x^2}$ on $\mathbb{R}$	Hermite polynomials
Apply Gram–Schmidt with weight $e^{-x}$ on $[0, \infty)$	Laguerre polynomials

Each row is just Gram–Schmidt applied to a different starting basis in a different inner product. The mechanics are identical to the worked example above; only the integrals get hairier. This is the gateway from linear algebra to Fourier analysis, approximation theory, and quantum mechanics, where state spaces are infinite-dimensional inner product spaces.

A subtlety

For $\int_a^b f^2 dx = 0$ to force $f \equiv 0$ we need $f$ continuous. Allowing more general functions (say, Riemann- or Lebesgue-integrable), this can fail — a function that's zero except at finitely many points integrates to zero. The fix is to identify functions that agree "almost everywhere"; the resulting space is $L^2[a, b]$, the proper home of Fourier analysis.

9. Common pitfalls

Confusing inner-product linearity with bilinearity over $\mathbb{C}$

Over the complex numbers the inner product is linear in the first slot but conjugate-linear in the second. So $\langle \mathbf{u}, c\mathbf{v} \rangle = \bar{c}\langle \mathbf{u}, \mathbf{v} \rangle$, not $c\langle \mathbf{u}, \mathbf{v} \rangle$. Forgetting the conjugate is one of the most common errors in complex inner-product calculations.

Skipping the normalization step

Gram–Schmidt produces orthogonal $\mathbf{u}_i$ first, then normalized $\mathbf{e}_i$. If you forget to divide by $\|\mathbf{u}_i\|$, your basis is orthogonal but not orthonormal — and the clean projection formula $\sum \langle \mathbf{v}, \mathbf{e}_i \rangle \mathbf{e}_i$ no longer applies; you need the version with $\langle \mathbf{e}_i, \mathbf{e}_i \rangle$ in the denominator.

Assuming any quadratic form is an inner product

A bilinear form can be symmetric and still fail positive-definiteness — the Minkowski form $\langle \mathbf{u}, \mathbf{v} \rangle = -u_0 v_0 + u_1 v_1 + u_2 v_2 + u_3 v_3$ of special relativity is the famous example. Without positive-definiteness, $\|\mathbf{v}\|$ may be imaginary, Cauchy–Schwarz can fail, and orthogonal sets need not be independent. These objects are useful but they are not inner products.

Pitfall · linearly dependent input to Gram–Schmidt

If the input vectors are linearly dependent, at some step you'll produce $\mathbf{u}_k = \mathbf{0}$, and the next normalization divides by zero. The cure is to detect $\mathbf{u}_k = \mathbf{0}$, skip that vector, and continue with the rest. This is exactly how Gram–Schmidt detects (and removes) redundancy when given an over-complete set.

10. Worked examples

Each one isolates a single move. Resist peeking until you've tried the recipe yourself.

Example 1 · Verify the integral defines an inner product

Show that $\langle f, g \rangle = \int_0^1 f(x) g(x)\, dx$ is an inner product on $C[0,1]$.

Symmetry. $\langle f, g \rangle = \int_0^1 f g\, dx = \int_0^1 g f\, dx = \langle g, f \rangle$. ✓

Linearity. $\langle af + bh,\, g \rangle = \int_0^1 (af + bh)g\, dx = a\int_0^1 f g\, dx + b\int_0^1 h g\, dx = a\langle f, g\rangle + b\langle h, g\rangle$. ✓

Positive-definiteness. $\langle f, f \rangle = \int_0^1 f(x)^2 dx \geq 0$. If $f$ is continuous and $\int_0^1 f^2 dx = 0$, then $f^2 \equiv 0$ (a continuous non-negative function with zero integral is identically zero), so $f \equiv 0$. ✓

Example 2 · Angle between two polynomials

In $C[-1, 1]$ with $\langle f, g \rangle = \int_{-1}^{1} f g\, dx$, find the angle between $f(x) = 1$ and $g(x) = x$.

$\langle f, g \rangle = \int_{-1}^{1} x\, dx = 0$. So $\cos\theta = 0$, meaning $\theta = \pi/2$.

The constant function and the identity function are orthogonal on $[-1, 1]$. That's not visible from the graphs — it's a fact about the inner product.

Example 3 · Project a vector onto a plane

In $\mathbb{R}^3$ with the standard dot product, project $\mathbf{v} = (1, 2, 3)$ onto the plane spanned by $\mathbf{a}_1 = (1, 0, 0)$ and $\mathbf{a}_2 = (0, 1, 0)$.

Both spanning vectors are already orthonormal, so

$$ \operatorname{proj}_W(\mathbf{v}) = \langle \mathbf{v}, \mathbf{a}_1 \rangle \mathbf{a}_1 + \langle \mathbf{v}, \mathbf{a}_2 \rangle \mathbf{a}_2 = 1\cdot(1,0,0) + 2\cdot(0,1,0) = (1, 2, 0). $$

The residual $\mathbf{v} - \operatorname{proj}_W(\mathbf{v}) = (0, 0, 3)$ is perpendicular to the plane, as advertised.

Example 4 · Cauchy–Schwarz on integrals

Show that for any continuous $f$ on $[0, 1]$,

$$ \left(\int_0^1 f(x)\, dx\right)^2 \leq \int_0^1 f(x)^2\, dx. $$

Apply Cauchy–Schwarz in $C[0, 1]$ with $g(x) = 1$:

$$ \left|\int_0^1 f(x)\cdot 1\, dx\right|^2 \leq \left(\int_0^1 1^2\, dx\right)\left(\int_0^1 f(x)^2\, dx\right) = \int_0^1 f^2\, dx. $$

The same algebraic move that bounds dot products bounds integrals.

Example 5 · Gram–Schmidt detects dependence

Apply Gram–Schmidt to $\mathbf{v}_1 = (1, 1, 0)$, $\mathbf{v}_2 = (2, 2, 0)$, $\mathbf{v}_3 = (0, 0, 1)$ in $\mathbb{R}^3$.

Step 1. $\mathbf{e}_1 = \tfrac{1}{\sqrt{2}}(1, 1, 0)$.

Step 2. $\langle \mathbf{v}_2, \mathbf{e}_1 \rangle = \tfrac{1}{\sqrt{2}}(2 + 2 + 0) = 2\sqrt{2}$.

$$ \mathbf{u}_2 = (2, 2, 0) - 2\sqrt{2} \cdot \tfrac{1}{\sqrt{2}}(1, 1, 0) = (2, 2, 0) - (2, 2, 0) = \mathbf{0}. $$

$\mathbf{u}_2 = \mathbf{0}$, signalling that $\mathbf{v}_2 = 2\mathbf{v}_1$ — linearly dependent. Drop $\mathbf{v}_2$ and continue.

Step 3. $\langle \mathbf{v}_3, \mathbf{e}_1 \rangle = 0$, so $\mathbf{u}_3 = (0, 0, 1)$ and $\mathbf{e}_3 = (0, 0, 1)$.

Final orthonormal basis of the span: $\{\mathbf{e}_1, \mathbf{e}_3\}$. The algorithm filtered out the redundant vector without us flagging it ahead of time.

Sources & further reading

The treatment above is synthesized from standard references. Where it sketches, the originals are where to go.

Lecture 17 · Orthogonal Matrices and Gram–Schmidt Course MIT OpenCourseWare · 18.06, Gilbert Strang

Strang's lecture on projection, orthonormal bases, and Gram–Schmidt, with the video, notes, and problem set. The canonical undergraduate exposition — start here if anything above felt rushed.
Linear Algebra Done Right · Chapter 6 Textbook Sheldon Axler · 4th edition (open access)

The rigorous, determinant-free development of inner product spaces — including the complex case, orthogonal complements, and projection — at the level of a careful first proofs course.
Alternate coordinate systems (bases) Tutorial Khan Academy · Linear Algebra

Worked-example-heavy walkthrough of orthogonal projections, change of basis, and Gram–Schmidt. Best for grinding the mechanics until the projection formula is second nature.
Orthonormal Bases Tutorial Paul's Online Math Notes · Lamar University

Step-by-step notes on Gram–Schmidt and orthonormal bases with multiple fully worked examples. Excellent companion when you want to see "how the algebra actually looks line by line."
Inner Product Reference Wolfram MathWorld

Dense, formal reference for the axioms and their immediate consequences. Pair with MathWorld's companion article on the Gram–Schmidt process for the procedural side.
Inner product space Encyclopedia Wikipedia

Broad overview that connects inner product spaces to Hilbert spaces, $L^2$, and the wider analytic landscape. Useful for situating this topic within functional analysis and quantum mechanics.