1. Why generalize the dot product?
In $\mathbb{R}^n$ the dot product $\mathbf{u} \cdot \mathbf{v} = u_1 v_1 + u_2 v_2 + \cdots + u_n v_n$ does two jobs at once. Numerically, it's a single scalar you can compute coordinate by coordinate. Geometrically, it tells you how much two vectors point the same way — it gives you length ($\|\mathbf{v}\|^2 = \mathbf{v} \cdot \mathbf{v}$), angle ($\cos\theta = \mathbf{u}\cdot\mathbf{v}/\|\mathbf{u}\|\|\mathbf{v}\|$), and orthogonality (the dot product is zero).
That second job — turning algebra into geometry — is what we want to keep. So we ask: what properties of the dot product are actually doing the work? Strip the coordinates away, write down only the abstract rules, and you have the definition of an inner product. Any vector space that admits one of these comes with its own geometry: lengths, angles, perpendicularity, the Pythagorean theorem. All for the price of three axioms.
The payoff is that "vector space" no longer means "arrows in $\mathbb{R}^n$." Polynomials, continuous functions, infinite sequences, random variables — all of them carry useful inner products, and the same theorems we'll prove here work in every one of them.
2. The inner product axioms
Let $V$ be a vector space over $\mathbb{R}$ or $\mathbb{C}$. An inner product is a function $\langle \cdot, \cdot \rangle : V \times V \to \mathbb{F}$ (where $\mathbb{F}$ is the underlying field) that satisfies three rules.
- Conjugate symmetry. $\langle \mathbf{u}, \mathbf{v} \rangle = \overline{\langle \mathbf{v}, \mathbf{u} \rangle}$ for all $\mathbf{u}, \mathbf{v} \in V$. Over $\mathbb{R}$ the conjugate does nothing, so this is plain symmetry: $\langle \mathbf{u}, \mathbf{v} \rangle = \langle \mathbf{v}, \mathbf{u} \rangle$.
- Linearity in the first argument. $\langle a\mathbf{u} + b\mathbf{w}, \mathbf{v} \rangle = a\langle \mathbf{u}, \mathbf{v} \rangle + b\langle \mathbf{w}, \mathbf{v} \rangle$ for all scalars $a, b$ and vectors $\mathbf{u}, \mathbf{v}, \mathbf{w}$.
- Positive-definiteness. $\langle \mathbf{v}, \mathbf{v} \rangle \geq 0$ for every $\mathbf{v}$, with equality if and only if $\mathbf{v} = \mathbf{0}$.
A vector space equipped with an inner product is called an inner product space. Combining axioms 1 and 2 gives conjugate-linearity in the second argument:
$$ \langle \mathbf{u}, a\mathbf{v} + b\mathbf{w} \rangle = \bar{a}\langle \mathbf{u}, \mathbf{v} \rangle + \bar{b}\langle \mathbf{u}, \mathbf{w} \rangle. $$Over $\mathbb{R}$ those conjugates vanish, and the inner product is linear in both slots — what we call a bilinear form. We'll mostly work over $\mathbb{R}$; the complex case adds only the conjugates.
Without conjugation, positive-definiteness would fail over $\mathbb{C}$. Try it: if $\langle i\mathbf{v}, i\mathbf{v} \rangle$ were $i^2 \langle \mathbf{v}, \mathbf{v} \rangle = -\langle \mathbf{v}, \mathbf{v} \rangle$, then a nonzero vector would have negative "length squared." Conjugate-linearity in the second slot rescues the inequality.
The standard examples
| Space | Inner product |
|---|---|
| $\mathbb{R}^n$ | $\langle \mathbf{u}, \mathbf{v} \rangle = \sum_{i=1}^{n} u_i v_i$ (the dot product) |
| $\mathbb{C}^n$ | $\langle \mathbf{u}, \mathbf{v} \rangle = \sum_{i=1}^{n} u_i \overline{v_i}$ |
| $C[a,b]$ (continuous real functions) | $\langle f, g \rangle = \int_a^b f(x) g(x)\, dx$ |
| $\mathbb{R}^n$ with weights $w_i > 0$ | $\langle \mathbf{u}, \mathbf{v} \rangle = \sum_i w_i u_i v_i$ |
All four are honest inner products — you can check each axiom directly. The same theorems we prove next will hold for every entry of this table simultaneously.
3. Induced norm and distance
Once you have an inner product, length comes for free. The induced norm is
$$ \|\mathbf{v}\| = \sqrt{\langle \mathbf{v}, \mathbf{v} \rangle}. $$This is well-defined because positive-definiteness guarantees the square root is of a non-negative real number. The norm satisfies the three norm axioms — positivity, absolute homogeneity ($\|c\mathbf{v}\| = |c|\,\|\mathbf{v}\|$), and the triangle inequality $\|\mathbf{u}+\mathbf{v}\| \leq \|\mathbf{u}\| + \|\mathbf{v}\|$. The triangle inequality is the only one that takes work; it follows from Cauchy–Schwarz, which we'll prove in a moment.
Distance follows immediately: $d(\mathbf{u}, \mathbf{v}) = \|\mathbf{u} - \mathbf{v}\|$. So an inner product gives you a norm, and a norm gives you a metric. Geometry slides out of algebra.
The same formula $\|\mathbf{v}\| = \sqrt{\langle \mathbf{v}, \mathbf{v} \rangle}$ that computes Euclidean length in $\mathbb{R}^n$ also computes the root-mean-square size of a function in $C[a,b]$ — namely $\sqrt{\int_a^b f(x)^2 dx}$. One definition, many incarnations.
4. The Cauchy–Schwarz inequality
The single most important inequality in the whole subject. It says: in any inner product space,
$$ |\langle \mathbf{u}, \mathbf{v} \rangle| \leq \|\mathbf{u}\|\,\|\mathbf{v}\|, $$with equality if and only if $\mathbf{u}$ and $\mathbf{v}$ are linearly dependent (one is a scalar multiple of the other).
The proof — and why it's clever
If $\mathbf{v} = \mathbf{0}$ both sides are zero and we're done. Otherwise, pick any scalar $t$ and consider the vector $\mathbf{u} - t\mathbf{v}$. By positive-definiteness,
$$ 0 \leq \|\mathbf{u} - t\mathbf{v}\|^2 = \langle \mathbf{u} - t\mathbf{v},\, \mathbf{u} - t\mathbf{v} \rangle. $$Expand using linearity (over $\mathbb{R}$ for clarity):
$$ \|\mathbf{u} - t\mathbf{v}\|^2 = \|\mathbf{u}\|^2 - 2t\langle \mathbf{u}, \mathbf{v} \rangle + t^2 \|\mathbf{v}\|^2. $$This is a non-negative quadratic in $t$. The trick is to pick $t$ to make the right-hand side as small as possible. Setting the derivative to zero gives the minimizer $t^* = \langle \mathbf{u}, \mathbf{v} \rangle / \|\mathbf{v}\|^2$. Plug it back in:
$$ 0 \leq \|\mathbf{u}\|^2 - \frac{\langle \mathbf{u}, \mathbf{v} \rangle^2}{\|\mathbf{v}\|^2}. $$Multiply through by $\|\mathbf{v}\|^2$ and take square roots:
$$ |\langle \mathbf{u}, \mathbf{v} \rangle| \leq \|\mathbf{u}\|\,\|\mathbf{v}\|. \qquad\blacksquare $$Equality forces $\|\mathbf{u} - t^*\mathbf{v}\|^2 = 0$, which by positive-definiteness means $\mathbf{u} = t^*\mathbf{v}$ — linear dependence.
What it really says
Rearrange the inequality:
$$ -1 \leq \frac{\langle \mathbf{u}, \mathbf{v} \rangle}{\|\mathbf{u}\|\,\|\mathbf{v}\|} \leq 1. $$That ratio lives in $[-1, 1]$, exactly the range of cosine. So we are allowed to define
$$ \cos\theta = \frac{\langle \mathbf{u}, \mathbf{v} \rangle}{\|\mathbf{u}\|\,\|\mathbf{v}\|}, $$and call $\theta \in [0, \pi]$ the angle between $\mathbf{u}$ and $\mathbf{v}$. Cauchy–Schwarz is what guarantees this definition is sensible in any inner product space, not just $\mathbb{R}^n$. Two polynomials, two random variables, two square-integrable functions — all have angles between them.
$\|\mathbf{u}+\mathbf{v}\|^2 = \|\mathbf{u}\|^2 + 2\langle \mathbf{u}, \mathbf{v} \rangle + \|\mathbf{v}\|^2 \leq \|\mathbf{u}\|^2 + 2\|\mathbf{u}\|\|\mathbf{v}\| + \|\mathbf{v}\|^2 = (\|\mathbf{u}\| + \|\mathbf{v}\|)^2$. Take square roots. Cauchy–Schwarz did all the work.
5. Orthogonality and Pythagoras
Two vectors $\mathbf{u}, \mathbf{v}$ in an inner product space are orthogonal, written $\mathbf{u} \perp \mathbf{v}$, if $\langle \mathbf{u}, \mathbf{v} \rangle = 0$. By convention the zero vector is orthogonal to everything.
Orthogonality is no longer a fact about right angles in pictures — it's an equation. And it gives us the Pythagorean theorem essentially for free.
If $\mathbf{u} \perp \mathbf{v}$, then
$$ \|\mathbf{u} + \mathbf{v}\|^2 = \langle \mathbf{u} + \mathbf{v},\, \mathbf{u} + \mathbf{v} \rangle = \|\mathbf{u}\|^2 + 2\langle \mathbf{u}, \mathbf{v} \rangle + \|\mathbf{v}\|^2 = \|\mathbf{u}\|^2 + \|\mathbf{v}\|^2. $$That's the Pythagorean theorem in the inner-product-space form. Three lines of algebra, applies to function spaces, complex spaces, and weighted spaces alike. (Over $\mathbb{C}$ you need both $\langle \mathbf{u}, \mathbf{v} \rangle = 0$ and $\langle \mathbf{v}, \mathbf{u} \rangle = 0$, but the first implies the second by conjugate symmetry.)
Orthogonal sets are automatically independent
If $\mathbf{v}_1, \ldots, \mathbf{v}_k$ are nonzero and pairwise orthogonal, they are linearly independent. Proof: suppose $c_1 \mathbf{v}_1 + \cdots + c_k \mathbf{v}_k = \mathbf{0}$. Take the inner product of both sides with $\mathbf{v}_i$. All cross terms vanish, leaving $c_i \|\mathbf{v}_i\|^2 = 0$, so $c_i = 0$ for each $i$.
That's a lot of work avoided. In a generic basis, checking independence means solving a system; in an orthogonal basis, it's already true and you can read coefficients off by inner products.
6. Orthogonal projection onto a subspace
Here's the question that motivates the next two sections: given a vector $\mathbf{v}$ and a subspace $W$, what is the vector in $W$ that's closest to $\mathbf{v}$? Closest in the sense of the induced distance $\|\mathbf{v} - \mathbf{w}\|$.
The answer is the orthogonal projection of $\mathbf{v}$ onto $W$, denoted $\operatorname{proj}_W(\mathbf{v})$. It is characterized by a single geometric condition:
$$ \mathbf{v} - \operatorname{proj}_W(\mathbf{v}) \perp W $$— the "error" is perpendicular to the whole subspace.
The formula, when you have an orthogonal basis
Suppose $\{\mathbf{e}_1, \ldots, \mathbf{e}_k\}$ is an orthogonal basis of $W$. Then
$$ \operatorname{proj}_W(\mathbf{v}) = \sum_{i=1}^{k} \frac{\langle \mathbf{v}, \mathbf{e}_i \rangle}{\langle \mathbf{e}_i, \mathbf{e}_i \rangle}\, \mathbf{e}_i. $$If the basis is additionally orthonormal (each $\|\mathbf{e}_i\| = 1$), the denominators are all 1 and the formula collapses to the bone-clean
$$ \operatorname{proj}_W(\mathbf{v}) = \sum_{i=1}^{k} \langle \mathbf{v}, \mathbf{e}_i \rangle\, \mathbf{e}_i. $$The coefficients $\langle \mathbf{v}, \mathbf{e}_i \rangle$ are called the Fourier coefficients of $\mathbf{v}$ relative to the basis. Read that name twice — the connection to Fourier series isn't a metaphor, it's the same formula in a different inner product space.
Best approximation: the punchline
For any $\mathbf{w} \in W$, write $\mathbf{v} - \mathbf{w} = \bigl(\mathbf{v} - \operatorname{proj}_W(\mathbf{v})\bigr) + \bigl(\operatorname{proj}_W(\mathbf{v}) - \mathbf{w}\bigr)$. The first piece is orthogonal to $W$; the second piece lives in $W$. By Pythagoras,
$$ \|\mathbf{v} - \mathbf{w}\|^2 = \|\mathbf{v} - \operatorname{proj}_W(\mathbf{v})\|^2 + \|\operatorname{proj}_W(\mathbf{v}) - \mathbf{w}\|^2. $$The right-hand side is minimized when the second term is zero, i.e., when $\mathbf{w} = \operatorname{proj}_W(\mathbf{v})$. So the orthogonal projection is provably the unique closest point — that's the best approximation theorem, and it underwrites everything from least-squares regression to truncated Fourier series.
Least-squares regression: project the data vector onto the column space of the design matrix. The "fit" is the projection; the "residual" is the perpendicular piece. Fourier series: project a function onto the span of $\{1, \cos x, \sin x, \cos 2x, \ldots\}$ in $C[-\pi, \pi]$. Same theorem, different inner product.
7. Gram–Schmidt orthonormalization
The projection formula is beautiful — but it demands an orthogonal basis. What if all you have is some ordinary, ungainly basis? The Gram–Schmidt process is a recipe that takes any basis and produces an orthonormal one spanning the same subspace, vector by vector.
Given linearly independent $\mathbf{v}_1, \mathbf{v}_2, \ldots, \mathbf{v}_n$, produce $\mathbf{e}_1, \mathbf{e}_2, \ldots, \mathbf{e}_n$ via:
$$ \begin{aligned} \mathbf{u}_1 &= \mathbf{v}_1, &\mathbf{e}_1 &= \frac{\mathbf{u}_1}{\|\mathbf{u}_1\|}, \\ \mathbf{u}_2 &= \mathbf{v}_2 - \langle \mathbf{v}_2, \mathbf{e}_1 \rangle \mathbf{e}_1, &\mathbf{e}_2 &= \frac{\mathbf{u}_2}{\|\mathbf{u}_2\|}, \\ \mathbf{u}_3 &= \mathbf{v}_3 - \langle \mathbf{v}_3, \mathbf{e}_1 \rangle \mathbf{e}_1 - \langle \mathbf{v}_3, \mathbf{e}_2 \rangle \mathbf{e}_2, &\mathbf{e}_3 &= \frac{\mathbf{u}_3}{\|\mathbf{u}_3\|}, \\ &\vdots &&\vdots \end{aligned} $$In words: at each step, take the next basis vector, subtract off its projection onto everything you've already orthonormalized, and normalize what's left. The geometry is exactly the picture below.
Gram–Schmidt sweeps from left to right. At each step the projection onto already-chosen directions is subtracted off; the perpendicular leftover is normalized to give the next orthonormal vector.
A complete worked 3-vector example
Apply Gram–Schmidt in $\mathbb{R}^3$ (standard dot product) to
$$ \mathbf{v}_1 = \begin{pmatrix} 1 \\ 1 \\ 0 \end{pmatrix},\quad \mathbf{v}_2 = \begin{pmatrix} 1 \\ 0 \\ 1 \end{pmatrix},\quad \mathbf{v}_3 = \begin{pmatrix} 0 \\ 1 \\ 1 \end{pmatrix}. $$Step 1. Normalize $\mathbf{v}_1$. We have $\|\mathbf{v}_1\| = \sqrt{1+1+0} = \sqrt{2}$, so
$$ \mathbf{e}_1 = \frac{1}{\sqrt{2}}\begin{pmatrix} 1 \\ 1 \\ 0 \end{pmatrix}. $$Step 2. Subtract the component of $\mathbf{v}_2$ along $\mathbf{e}_1$. The inner product is
$$ \langle \mathbf{v}_2, \mathbf{e}_1 \rangle = \tfrac{1}{\sqrt{2}}(1\cdot 1 + 0\cdot 1 + 1\cdot 0) = \tfrac{1}{\sqrt{2}}. $$ $$ \mathbf{u}_2 = \mathbf{v}_2 - \tfrac{1}{\sqrt{2}} \cdot \mathbf{e}_1 = \begin{pmatrix} 1 \\ 0 \\ 1 \end{pmatrix} - \tfrac{1}{2}\begin{pmatrix} 1 \\ 1 \\ 0 \end{pmatrix} = \begin{pmatrix} 1/2 \\ -1/2 \\ 1 \end{pmatrix}. $$Now $\|\mathbf{u}_2\|^2 = \tfrac{1}{4} + \tfrac{1}{4} + 1 = \tfrac{3}{2}$, so $\|\mathbf{u}_2\| = \sqrt{3/2} = \sqrt{6}/2$. Normalizing:
$$ \mathbf{e}_2 = \frac{1}{\sqrt{6}}\begin{pmatrix} 1 \\ -1 \\ 2 \end{pmatrix}. $$Step 3. Subtract from $\mathbf{v}_3$ its components along both $\mathbf{e}_1$ and $\mathbf{e}_2$:
$$ \langle \mathbf{v}_3, \mathbf{e}_1 \rangle = \tfrac{1}{\sqrt{2}}(0 + 1 + 0) = \tfrac{1}{\sqrt{2}}, \quad \langle \mathbf{v}_3, \mathbf{e}_2 \rangle = \tfrac{1}{\sqrt{6}}(0 - 1 + 2) = \tfrac{1}{\sqrt{6}}. $$ $$ \mathbf{u}_3 = \mathbf{v}_3 - \tfrac{1}{\sqrt{2}}\mathbf{e}_1 - \tfrac{1}{\sqrt{6}}\mathbf{e}_2 = \begin{pmatrix} 0 \\ 1 \\ 1 \end{pmatrix} - \tfrac{1}{2}\begin{pmatrix} 1 \\ 1 \\ 0 \end{pmatrix} - \tfrac{1}{6}\begin{pmatrix} 1 \\ -1 \\ 2 \end{pmatrix}. $$Combine the last two corrections coordinate by coordinate. The $x$-coordinate is $0 - \tfrac{1}{2} - \tfrac{1}{6} = -\tfrac{2}{3}$. The $y$-coordinate is $1 - \tfrac{1}{2} + \tfrac{1}{6} = \tfrac{2}{3}$. The $z$-coordinate is $1 - 0 - \tfrac{1}{3} = \tfrac{2}{3}$. So
$$ \mathbf{u}_3 = \begin{pmatrix} -2/3 \\ 2/3 \\ 2/3 \end{pmatrix}, \quad \|\mathbf{u}_3\| = \tfrac{2}{3}\sqrt{3} = \tfrac{2}{\sqrt{3}}. $$ $$ \mathbf{e}_3 = \frac{1}{\sqrt{3}}\begin{pmatrix} -1 \\ 1 \\ 1 \end{pmatrix}. $$Sanity check. Verify orthogonality: $\langle \mathbf{e}_1, \mathbf{e}_2 \rangle = \tfrac{1}{\sqrt{12}}(1 - 1 + 0) = 0$ ✓, $\langle \mathbf{e}_1, \mathbf{e}_3 \rangle = \tfrac{1}{\sqrt{6}}(-1 + 1 + 0) = 0$ ✓, $\langle \mathbf{e}_2, \mathbf{e}_3 \rangle = \tfrac{1}{\sqrt{18}}(-1 - 1 + 2) = 0$ ✓. Each is a unit vector by construction. We've produced an orthonormal basis of $\mathbb{R}^3$.
Classical Gram–Schmidt as written is numerically unstable: rounding errors in early steps amplify later, and the result is often noticeably non-orthogonal. In code, use modified Gram–Schmidt (subtract projections one at a time, updating after each) or, better, a Householder QR. The math is the same; the floating-point behavior is not.
8. Beyond $\mathbb{R}^n$: function spaces
The most consequential inner product in applied mathematics doesn't act on tuples of numbers — it acts on functions. On the space of continuous real functions on $[a, b]$, define
$$ \langle f, g \rangle = \int_a^b f(x) g(x)\, dx. $$Check the axioms: symmetry is obvious (multiplication of real numbers commutes), linearity follows from linearity of the integral, and positive-definiteness is $\int_a^b f(x)^2 dx \geq 0$ with equality only when $f \equiv 0$ (for continuous $f$).
Everything we've built carries over. The norm is $\|f\| = \sqrt{\int_a^b f^2 dx}$ — the root-mean-square size. Two functions are orthogonal when $\int_a^b f g\, dx = 0$. Cauchy–Schwarz becomes
$$ \left|\int_a^b f(x) g(x)\, dx\right| \leq \sqrt{\int_a^b f^2\, dx} \cdot \sqrt{\int_a^b g^2\, dx}. $$Projection onto a finite-dimensional subspace of functions is still the best approximation in the RMS sense. And the orthonormal basis you choose dictates which family of approximations you get:
| Orthonormal basis | What you build |
|---|---|
| $\{1, \cos nx, \sin nx\}$ on $[-\pi, \pi]$ | Fourier series |
| Apply Gram–Schmidt to $\{1, x, x^2, \ldots\}$ on $[-1, 1]$ | Legendre polynomials |
| Apply Gram–Schmidt with weight $e^{-x^2}$ on $\mathbb{R}$ | Hermite polynomials |
| Apply Gram–Schmidt with weight $e^{-x}$ on $[0, \infty)$ | Laguerre polynomials |
Each row is just Gram–Schmidt applied to a different starting basis in a different inner product. The mechanics are identical to the worked example above; only the integrals get hairier. This is the gateway from linear algebra to Fourier analysis, approximation theory, and quantum mechanics, where state spaces are infinite-dimensional inner product spaces.
For $\int_a^b f^2 dx = 0$ to force $f \equiv 0$ we need $f$ continuous. Allowing more general functions (say, Riemann- or Lebesgue-integrable), this can fail — a function that's zero except at finitely many points integrates to zero. The fix is to identify functions that agree "almost everywhere"; the resulting space is $L^2[a, b]$, the proper home of Fourier analysis.
9. Common pitfalls
Over the complex numbers the inner product is linear in the first slot but conjugate-linear in the second. So $\langle \mathbf{u}, c\mathbf{v} \rangle = \bar{c}\langle \mathbf{u}, \mathbf{v} \rangle$, not $c\langle \mathbf{u}, \mathbf{v} \rangle$. Forgetting the conjugate is one of the most common errors in complex inner-product calculations.
Gram–Schmidt produces orthogonal $\mathbf{u}_i$ first, then normalized $\mathbf{e}_i$. If you forget to divide by $\|\mathbf{u}_i\|$, your basis is orthogonal but not orthonormal — and the clean projection formula $\sum \langle \mathbf{v}, \mathbf{e}_i \rangle \mathbf{e}_i$ no longer applies; you need the version with $\langle \mathbf{e}_i, \mathbf{e}_i \rangle$ in the denominator.
A bilinear form can be symmetric and still fail positive-definiteness — the Minkowski form $\langle \mathbf{u}, \mathbf{v} \rangle = -u_0 v_0 + u_1 v_1 + u_2 v_2 + u_3 v_3$ of special relativity is the famous example. Without positive-definiteness, $\|\mathbf{v}\|$ may be imaginary, Cauchy–Schwarz can fail, and orthogonal sets need not be independent. These objects are useful but they are not inner products.
If the input vectors are linearly dependent, at some step you'll produce $\mathbf{u}_k = \mathbf{0}$, and the next normalization divides by zero. The cure is to detect $\mathbf{u}_k = \mathbf{0}$, skip that vector, and continue with the rest. This is exactly how Gram–Schmidt detects (and removes) redundancy when given an over-complete set.
10. Worked examples
Each one isolates a single move. Resist peeking until you've tried the recipe yourself.
Example 1 · Verify the integral defines an inner product
Show that $\langle f, g \rangle = \int_0^1 f(x) g(x)\, dx$ is an inner product on $C[0,1]$.
Symmetry. $\langle f, g \rangle = \int_0^1 f g\, dx = \int_0^1 g f\, dx = \langle g, f \rangle$. ✓
Linearity. $\langle af + bh,\, g \rangle = \int_0^1 (af + bh)g\, dx = a\int_0^1 f g\, dx + b\int_0^1 h g\, dx = a\langle f, g\rangle + b\langle h, g\rangle$. ✓
Positive-definiteness. $\langle f, f \rangle = \int_0^1 f(x)^2 dx \geq 0$. If $f$ is continuous and $\int_0^1 f^2 dx = 0$, then $f^2 \equiv 0$ (a continuous non-negative function with zero integral is identically zero), so $f \equiv 0$. ✓
Example 2 · Angle between two polynomials
In $C[-1, 1]$ with $\langle f, g \rangle = \int_{-1}^{1} f g\, dx$, find the angle between $f(x) = 1$ and $g(x) = x$.
$\langle f, g \rangle = \int_{-1}^{1} x\, dx = 0$. So $\cos\theta = 0$, meaning $\theta = \pi/2$.
The constant function and the identity function are orthogonal on $[-1, 1]$. That's not visible from the graphs — it's a fact about the inner product.
Example 3 · Project a vector onto a plane
In $\mathbb{R}^3$ with the standard dot product, project $\mathbf{v} = (1, 2, 3)$ onto the plane spanned by $\mathbf{a}_1 = (1, 0, 0)$ and $\mathbf{a}_2 = (0, 1, 0)$.
Both spanning vectors are already orthonormal, so
$$ \operatorname{proj}_W(\mathbf{v}) = \langle \mathbf{v}, \mathbf{a}_1 \rangle \mathbf{a}_1 + \langle \mathbf{v}, \mathbf{a}_2 \rangle \mathbf{a}_2 = 1\cdot(1,0,0) + 2\cdot(0,1,0) = (1, 2, 0). $$The residual $\mathbf{v} - \operatorname{proj}_W(\mathbf{v}) = (0, 0, 3)$ is perpendicular to the plane, as advertised.
Example 4 · Cauchy–Schwarz on integrals
Show that for any continuous $f$ on $[0, 1]$,
$$ \left(\int_0^1 f(x)\, dx\right)^2 \leq \int_0^1 f(x)^2\, dx. $$Apply Cauchy–Schwarz in $C[0, 1]$ with $g(x) = 1$:
$$ \left|\int_0^1 f(x)\cdot 1\, dx\right|^2 \leq \left(\int_0^1 1^2\, dx\right)\left(\int_0^1 f(x)^2\, dx\right) = \int_0^1 f^2\, dx. $$The same algebraic move that bounds dot products bounds integrals.
Example 5 · Gram–Schmidt detects dependence
Apply Gram–Schmidt to $\mathbf{v}_1 = (1, 1, 0)$, $\mathbf{v}_2 = (2, 2, 0)$, $\mathbf{v}_3 = (0, 0, 1)$ in $\mathbb{R}^3$.
Step 1. $\mathbf{e}_1 = \tfrac{1}{\sqrt{2}}(1, 1, 0)$.
Step 2. $\langle \mathbf{v}_2, \mathbf{e}_1 \rangle = \tfrac{1}{\sqrt{2}}(2 + 2 + 0) = 2\sqrt{2}$.
$$ \mathbf{u}_2 = (2, 2, 0) - 2\sqrt{2} \cdot \tfrac{1}{\sqrt{2}}(1, 1, 0) = (2, 2, 0) - (2, 2, 0) = \mathbf{0}. $$$\mathbf{u}_2 = \mathbf{0}$, signalling that $\mathbf{v}_2 = 2\mathbf{v}_1$ — linearly dependent. Drop $\mathbf{v}_2$ and continue.
Step 3. $\langle \mathbf{v}_3, \mathbf{e}_1 \rangle = 0$, so $\mathbf{u}_3 = (0, 0, 1)$ and $\mathbf{e}_3 = (0, 0, 1)$.
Final orthonormal basis of the span: $\{\mathbf{e}_1, \mathbf{e}_3\}$. The algorithm filtered out the redundant vector without us flagging it ahead of time.