Trees & Spanning Trees — Discrete Math & Combinatorics

What you'll leave with

Three equivalent characterisations of a tree, and why they all say the same thing.
The vocabulary of rooted trees — parent, child, depth, height — and the special case of binary trees.
What a spanning tree is, why connected graphs always have one, and what makes the minimum one special.
Two algorithms that compute the MST — Kruskal and Prim — and why they always agree on the total weight.
Cayley's formula: the surprisingly clean count of $n^{n-2}$ labelled trees on $n$ vertices.

1. What a tree is

Tree

A tree is an undirected graph that is connected and contains no cycles. That's the whole definition — but it has remarkable consequences.

The two conditions pull in opposite directions and meet at exactly the right point. "Connected" says at least $n - 1$ edges (every fewer and the graph breaks apart). "Acyclic" says at most $n - 1$ edges (every more and a cycle is forced). Together, a tree on $n$ vertices has exactly $n - 1$ edges. No slack, no excess.

Three ways of saying the same thing

For a graph $G$ on $n$ vertices, the following are equivalent — any one implies the other two:

$G$ is connected and acyclic (the textbook definition).
Any two vertices in $G$ are joined by exactly one path.
$G$ is connected and has exactly $n - 1$ edges.

The second is the one to remember when you want to use trees. In a tree there's no choice about how to get from $u$ to $v$ — exactly one route exists, and that route is the answer to almost every question you'll ask. In a general graph, "shortest path" is interesting because there are alternatives. In a tree, the path just is the answer.

A tree on $n = 7$ vertices with $n - 1 = 6$ edges. No cycles, fully connected — every pair of vertices is joined by exactly one path.

Why it matters

The "exactly one path" property is what makes trees usable as data structures. A file system is a tree because there is exactly one path from root to any file — and that path is the file's name. Lose the tree property and "where is this file?" stops being well-defined.

2. Rooted trees: parent, child, leaf

The bare definition of a tree is symmetric — no vertex is special. The moment you pick one vertex and call it the root, the symmetry breaks and a hierarchy emerges. Every other vertex has a unique path back to the root, and that path induces direction: the neighbour one step closer to the root is the vertex's parent, and every neighbour one step further away is a child.

The standard vocabulary:

Root — the distinguished vertex, drawn at the top by convention.
Parent / child — the relationship between a vertex and its neighbours, oriented away from the root.
Leaf — a vertex with no children (degree 1 in the underlying tree, except for the root if $n = 1$).
Internal vertex — any non-leaf vertex.
Depth of $v$ — the length of the unique path from the root to $v$. The root has depth $0$.
Height of the tree — the maximum depth over all vertices. A single vertex has height $0$.
Ancestor / descendant — $u$ is an ancestor of $v$ if $u$ lies on the path from $v$ to the root.

Note

The "root at the top" convention is universal in computer science and pretty silly when you say it out loud — trees grow up in nature and down in our diagrams. The convention won because the root is the most important vertex, and we read top-to-bottom.

3. Binary trees

A binary tree is a rooted tree in which every vertex has at most two children, distinguished as left and right. The left/right distinction matters: a binary tree with a single child on the left is a different tree from one with a single child on the right, even though as unrooted graphs they're identical.

Several special shapes recur often enough to have names:

Shape	Definition	Vertices at depth $d$
Full	Every internal vertex has exactly two children.	Varies
Perfect	Full, and every leaf is at the same depth.	Exactly $2^d$ at depth $d$; total $2^{h+1} - 1$
Complete	Every level is full except possibly the last, which fills from the left.	$2^d$ except possibly the last level
Balanced	The depths of any two leaves differ by at most a constant (definitions vary).	Height $O(\log n)$

The perfect binary tree of height $h$ has $2^{h+1} - 1$ vertices in total, and that doubling-at-each-level structure is the source of the universal $O(\log n)$ promise of tree-based data structures: $n$ items in a balanced binary search tree fit in a tree of height $\lceil \log_2 n \rceil$, so any single root-to-leaf traversal touches only $\log n$ vertices.

Connection

Binary search trees are the algorithmic celebrity here — they add an ordering constraint (left subtree < node < right subtree) that makes lookup, insertion, and deletion run in $O(\log n)$ on average. The structure itself is just a binary tree; the magic is the invariant.

4. Spanning trees

Now we switch from "trees as standalone objects" to "trees living inside a bigger graph". Given a connected graph $G$ on $n$ vertices, a spanning tree is a subgraph of $G$ that:

Contains every vertex of $G$ (it spans).
Is a tree (connected and acyclic).

Equivalently: it's a connected subgraph with exactly $n - 1$ edges that covers all the vertices. You take the original graph, throw away just enough edges to kill every cycle, and stop the moment the result is still connected.

A graph $K_4$ on 4 vertices, and one of its 16 spanning trees ($4^{4-2} = 16$ by Cayley's formula).

How many spanning trees does a graph have?

Usually more than you'd guess. For the complete graph $K_n$ on $n$ vertices, the count is given by a famously clean result:

Cayley's formula

The number of distinct labelled spanning trees of $K_n$ is

$$ \tau(K_n) = n^{n-2}. $$

So $K_3$ has $3^1 = 3$ spanning trees, $K_4$ has $4^2 = 16$, $K_5$ has $5^3 = 125$, and by $K_{10}$ you're already past 100 million.

For an arbitrary connected graph the count is given by the matrix-tree theorem (Kirchhoff's theorem), which expresses $\tau(G)$ as a determinant of the graph's Laplacian. The point worth taking away: spanning trees are not unique. Usually there are many, and the question "which one?" is the question that gives the topic its bite.

5. Minimum spanning trees

Now suppose the edges carry weights — distances between cities, costs of laying cable, latencies between routers. Of all the spanning trees a connected weighted graph admits, the one whose total edge weight is smallest is called its minimum spanning tree, or MST.

Minimum spanning tree

Given a connected, undirected, weighted graph $G = (V, E, w)$ with $w: E \to \mathbb{R}$, a minimum spanning tree is a spanning tree $T \subseteq E$ that minimises

$$ w(T) = \sum_{e \in T} w(e). $$

The phrasing is concrete: "the cheapest set of $n - 1$ edges that keeps the network connected". If you're stringing fibre to $n$ buildings and you've measured the cost of every possible link, the MST tells you which links to actually lay.

Is the MST unique?

Not always — but the answer follows a clean rule:

If every edge has a distinct weight, the MST is unique.

When weights are distinct, there is exactly one cheapest spanning tree. When ties exist, multiple spanning trees can hit the same minimum total weight; you can still find an MST, but the choice of which one isn't forced. The total weight, however, is always the same — every MST of $G$ weighs the same as every other MST of $G$.

6. Kruskal's algorithm

Kruskal's algorithm is the most natural greedy idea you could possibly write down. Forget the graph's structure for a moment and look only at the edges:

Sort all edges in increasing order of weight.
Start with an empty set $T$ of edges.
Walk through the sorted edges. For each edge $e$, add it to $T$ if doing so does not create a cycle. Otherwise, skip it.
Stop when $T$ has $n - 1$ edges.

That's it. The output is an MST. The only non-trivial part of an implementation is step 3 — efficiently asking "would this edge close a cycle?" The standard answer is a union-find (disjoint-set) data structure, which keeps the vertices grouped into connected components and answers "are $u$ and $v$ already in the same component?" in near-constant time. Adding an edge that joins two different components is safe; adding one that joins a component to itself would close a cycle.

Conceptually, Kruskal's algorithm starts with $n$ separate one-vertex pieces of forest and glues them together using the cheapest legal edge each time, until everything has merged into a single tree.

Why it works

At every step Kruskal picks the cheapest edge that doesn't immediately ruin the tree property. The cut property (section 8) shows that the cheapest edge crossing any partition of the vertices must be in some MST, and that's exactly the local condition Kruskal exploits.

7. Prim's algorithm

Prim's algorithm is the same greedy spirit applied differently. Instead of looking at the whole edge list, Prim grows a single tree outward from a starting vertex:

Pick any starting vertex $s$ and put it in the tree $T$.
Repeat until $T$ contains all $n$ vertices: find the cheapest edge with exactly one endpoint already in $T$, and add that edge (and its new vertex) to $T$.

Where Kruskal looks at all edges globally and merges forest components, Prim works locally — at every moment there is one tree that has been built so far, and the next move is the cheapest edge connecting it to a new vertex. The implementation typically uses a priority queue keyed on "cost to reach this not-yet-in-tree vertex from the tree", giving an $O(E \log V)$ running time.

Prim grows a tree from one vertex. At each step the next-cheapest border edge is absorbed; the in-tree set (orange) expands by one vertex.

8. Why both algorithms agree: the cut property

Kruskal and Prim look very different, yet they always return spanning trees of the same total weight. The common theorem that explains both is the cut property:

Cut property

For any partition of the vertices $V$ into two non-empty sets $S$ and $V \setminus S$, consider all edges that have one endpoint in $S$ and the other in $V \setminus S$ (the "cut edges"). The cheapest such edge is contained in some MST.

If that cheapest cut edge is strictly the cheapest (no ties), then it is contained in every MST.

The proof is a one-paragraph exchange argument: suppose an MST $T$ doesn't contain the cheapest cut edge $e$. Add $e$ to $T$. The result has a cycle (because $T$ was a spanning tree). That cycle must cross the cut at least once more, on some edge $e'$. Remove $e'$. You've produced a spanning tree of weight $w(T) + w(e) - w(e')$. Since $w(e) \leq w(e')$ (because $e$ was the cheapest cut edge), the new weight is no larger — so it's also an MST, and it does contain $e$.

Both algorithms are special cases:

Kruskal considers, at each step, the cheapest remaining edge. The cut that vouches for it is the one separating its two components.
Prim considers, at each step, the cheapest edge crossing the cut between "in tree" and "not in tree". That's the cut property applied verbatim.

The two algorithms can produce different spanning trees when there are ties in edge weights — they're picking different cheapest-cut edges. But the total weight is the same, because the cut property pins down what every MST must look like up to ties.

Matroids, briefly

The deep reason greedy works for MST and fails for almost everything else (like maximum-weight Hamiltonian cycle) is that the spanning trees of a graph form a structure called a matroid. The greedy algorithm is provably optimal on matroids and provably not guaranteed optimal off them. MST is the cleanest textbook example of "the greedy you'd write down actually works".

9. Common pitfalls

"Spanning tree" vs "the MST"

Every connected graph has spanning trees — usually many. The MST is the cheapest of those by total weight. Don't claim you've found "the MST" when you've only found "a spanning tree".

Forgetting $n - 1$

A tree on $n$ vertices has exactly $n - 1$ edges. Not $n$. Not $n + 1$. If your supposed spanning tree has the wrong edge count, it's either disconnected (too few) or contains a cycle (too many).

Greedy doesn't always work

"Sort by weight, take greedily" is exactly the right idea for minimum spanning tree. It's exactly the wrong idea for, say, finding the cheapest Hamiltonian cycle (the TSP). The reason MST is special is the matroid structure of its solution space — most combinatorial problems don't have it, and greedy fails on them.

Negative weights are fine

MST algorithms work perfectly with negative edge weights — there is no analogue of Dijkstra's failure mode here. The total weight of the MST is simply minimised whether weights are positive, negative, or mixed. Don't add a "no negatives" precondition that isn't there.

10. Worked examples

Example 1 · Count spanning trees of $K_4$

Setup. $K_4$ is the complete graph on 4 vertices — every pair of vertices is joined by an edge, so 6 edges total.

Apply Cayley's formula.

$$ \tau(K_n) = n^{n-2} \quad\Longrightarrow\quad \tau(K_4) = 4^{4-2} = 4^2 = 16. $$

Sanity check. A spanning tree of $K_4$ has $n - 1 = 3$ edges. There are $\binom{6}{3} = 20$ ways to choose 3 of the 6 edges; of those, the 4 choices that form a triangle plus one isolated vertex are invalid (they leave the fourth vertex disconnected and form a cycle). $20 - 4 = 16$. ✓

Example 2 · Kruskal on a 4-vertex weighted graph

Setup. Vertices $\{1, 2, 3, 4\}$ with edges and weights:

Edge	Weight
(1, 2)	1
(2, 3)	4
(3, 4)	2
(1, 4)	3
(1, 3)	5

Step 1. Sort by weight: $(1,2)_1$, $(3,4)_2$, $(1,4)_3$, $(2,3)_4$, $(1,3)_5$.

Step 2. Walk the list:

$(1,2)$ — add. Components: $\{1,2\}, \{3\}, \{4\}$.
$(3,4)$ — add. Components: $\{1,2\}, \{3,4\}$.
$(1,4)$ — add (joins the two components). Components: $\{1,2,3,4\}$.
Stop — we have $n - 1 = 3$ edges.

MST: $\{(1,2), (3,4), (1,4)\}$. Total weight: $1 + 2 + 3 = 6$.

Example 3 · Prim on the same graph, starting at vertex 1

Step 1. Tree = $\{1\}$. Border edges: $(1,2)_1$, $(1,4)_3$, $(1,3)_5$. Cheapest: $(1,2)$.

Step 2. Tree = $\{1, 2\}$. Border edges (one endpoint in tree, one outside): $(1,4)_3$, $(1,3)_5$, $(2,3)_4$. Cheapest: $(1,4)$.

Step 3. Tree = $\{1, 2, 4\}$. Border edges: $(1,3)_5$, $(2,3)_4$, $(3,4)_2$. Cheapest: $(3,4)$.

Step 4. Tree = $\{1, 2, 3, 4\}$. All vertices in — stop.

MST: $\{(1,2), (1,4), (3,4)\}$. Total weight: $1 + 3 + 2 = 6$. ✓

Same edges as Kruskal, same total. With ties or a different start vertex you might get a different MST — but always the same total weight.

Example 4 · A graph with non-unique MST

Setup. Triangle $\{1, 2, 3\}$ with all three edge weights equal to $1$.

Spanning trees. Three possible: $\{(1,2),(2,3)\}$, $\{(1,2),(1,3)\}$, or $\{(2,3),(1,3)\}$.

Each has total weight $2$. All three are minimum spanning trees. The MST is not unique — but the minimum total weight is.

This is exactly the situation the cut property warns about: when ties exist, multiple cheapest cut edges qualify, and the choice between them is what makes the MST non-unique.

Example 5 · Why greedy fails for maximum-weight Hamiltonian cycle

Setup. Four cities arranged in a square. The two diagonals are cheap (weight 1); the four sides are expensive (weight 10).

Greedy "pick cheapest edge that fits". Picks both diagonals first — but two diagonals share no endpoints (in $K_4$ they share none useful here), and a Hamiltonian cycle needs every vertex to have degree exactly 2 along the cycle. The greedy choice traps you into needing expensive completions.

Lesson. Greedy works for MST because the spanning-tree structure is a matroid. It fails for Hamiltonian-cycle problems because they aren't. This is the gateway to the next topic: the Traveling Salesperson Problem.

Sources & further reading

The content above is synthesized from established discrete-math references. For depth on any single section, the primary sources below are the ground truth.

Trees Textbook OpenStax · Contemporary Mathematics, §12.7

Peer-reviewed, openly licensed chapter on trees and their basic properties at an introductory level. Good starting point with carefully worked small examples.
Spanning Trees Textbook OpenStax · Contemporary Mathematics, §12.8

Companion chapter that introduces spanning trees, minimum spanning trees, and Kruskal's algorithm with worked numerical examples.
Greedy Algorithms, Minimum Spanning Tree Course MIT OCW · 6.046J Design and Analysis of Algorithms, Lecture 12

Lecture video, notes, and slides covering the cut property, Prim's algorithm, and Kruskal's algorithm at full undergraduate rigour. The right next step once the prose intuition is in place.
Spanning Tree Reference Wolfram MathWorld

Formal reference, including the matrix-tree theorem and counting formulas for special families of graphs.
Minimum spanning tree Encyclopedia Wikipedia

Broad overview covering MST history (Borůvka, 1926), classical algorithms, randomized variants, and the still-open question of an optimal deterministic MST algorithm.
Cayley's formula Encyclopedia Wikipedia

Discussion of the $n^{n-2}$ result for labelled trees, with several proofs (Prüfer sequences, the matrix-tree theorem, double-counting).