1. The setup: graphs, weights, conventions
A graph $G = (V, E)$ is a set of vertices $V$ and a set of edges $E \subseteq V \times V$. We write $n = |V|$ for the number of vertices and $m = |E|$ for the number of edges. Edges may be directed (an arrow from $u$ to $v$) or undirected (an unordered pair), and they may carry a weight $w(u, v) \in \mathbb{R}$. In running times the standard shorthand is $O(V + E)$ rather than $O(n + m)$ — read $V$ and $E$ as their sizes when they appear in a complexity expression.
Three quick conventions used throughout:
- Distance to a vertex is written $d[v]$, initialized to $\infty$ for all vertices except the source $s$, where $d[s] = 0$.
- A vertex's neighbors are $N(v) = \{ u : (v, u) \in E \}$.
- A relaxation of edge $(u, v)$ with weight $w$ means: if $d[u] + w < d[v]$, update $d[v] \leftarrow d[u] + w$. Almost every shortest-path algorithm is a particular discipline of edge relaxations.
2. Breadth-first search
BFS explores a graph in waves. Starting at a source $s$, it visits everything at distance $1$, then everything at distance $2$, and so on. The data structure that makes this work is a queue: first in, first out, so that a vertex you enqueue when expanding the wave at depth $k$ is dequeued only after every vertex at depth $k$ already in the queue.
- Initialize $d[s] \leftarrow 0$ and $d[v] \leftarrow \infty$ for every other vertex; create an empty queue $Q$ and enqueue $s$.
- While $Q$ is non-empty, dequeue a vertex $v$.
- For each neighbor $u \in N(v)$ with $d[u] = \infty$ (i.e. not yet discovered): set $d[u] \leftarrow d[v] + 1$, record $v$ as $u$'s parent, and enqueue $u$.
- When the queue empties, $d[v]$ holds the shortest unweighted distance from $s$ for every reachable $v$.
Every vertex is enqueued at most once and every edge is inspected at most twice (once from each endpoint, in an undirected graph), giving the classical $O(V + E)$ bound. The parent pointers, taken together, form the BFS tree — a spanning tree of the connected component containing $s$ in which every root-to-leaf path is a shortest path from $s$.
BFS is the right hammer when:
- You need shortest paths in an unweighted graph — or equivalently, in a graph where every edge has the same weight.
- You want the level structure of a graph: which vertices are exactly $k$ steps from $s$.
- You're checking bipartiteness by 2-colouring the BFS tree.
- You're computing connected components in an undirected graph (run BFS from each undiscovered vertex).
3. Depth-first search
DFS does the opposite: it goes deep before it goes wide. From the current vertex, it picks an unvisited neighbor, walks there, and recurses; only when a vertex has no unvisited neighbors does it back up and try the next branch. Either an explicit stack or — more commonly — the call stack of a recursive routine serves as the data structure.
- Mark $v$ as discovered and record its discovery time.
- For each neighbor $u \in N(v)$ that is not yet discovered, recursively DFS from $u$ (recording $v$ as its DFS parent).
- When all neighbors have been processed, mark $v$ as finished and record its finish time.
Run from a source until done; if vertices remain undiscovered, repeat from any of them. The total work is again $O(V + E)$, since every vertex is entered once and every edge is examined a constant number of times. The discovery and finish times — the pair $(d[v], f[v])$ — turn out to encode an astonishing amount of structure, which is why DFS shows up in so many algorithms.
DFS is the right tool when:
- Cycle detection: in an undirected graph, any edge from $v$ to an already-discovered, non-parent vertex closes a cycle. In a directed graph, a back-edge (to an ancestor still on the recursion stack) is the cycle witness.
- Topological sort on a DAG: reverse the order of finish times.
- Strongly connected components: Tarjan's and Kosaraju's algorithms both run two DFS passes and report SCCs in $O(V + E)$.
- Bridges and articulation points: Tarjan's DFS with low-link values finds them in one pass.
The DFS tree, edges-not-in-the-tree, and the discovery/finish times together let you classify every edge of a directed graph as tree, forward, back, or cross. That classification is what makes DFS so much more than "BFS but stack instead of queue."
4. BFS vs DFS at a glance
BFS — explore in layers
Queue. Wave-by-wave from source. $O(V + E)$ time, $O(V)$ space.
- Shortest paths in unweighted graphs
- Level sets / "everything within $k$ steps"
- Bipartiteness check
- Connected components (undirected)
DFS — explore by diving
Stack or recursion. Deep first, backtrack. $O(V + E)$ time, $O(V)$ space.
- Cycle detection
- Topological sort
- Strongly connected components
- Bridges and articulation points
5. Dijkstra's algorithm
When edges carry non-negative weights, BFS is no longer enough — a short hop of weight $10$ may still be longer than a roundabout path of three weight-$1$ edges. Dijkstra's algorithm generalizes the BFS idea: instead of expanding by integer waves, it expands greedily from the unvisited vertex of smallest tentative distance.
Given a graph with non-negative edge weights and a source $s$, compute $d[v]$, the length of a shortest path from $s$ to every vertex $v$, by repeatedly extracting the unsettled vertex of minimum tentative distance and relaxing its outgoing edges.
- Set $d[s] \leftarrow 0$ and $d[v] \leftarrow \infty$ for every other vertex. Put every vertex into a min-priority queue keyed by $d$.
- While the priority queue is non-empty, extract the vertex $u$ with the smallest $d[u]$ and mark it settled. Its $d[u]$ is now the final shortest-path distance.
- For each edge $(u, v)$ with weight $w$ where $v$ is still unsettled, relax: if $d[u] + w < d[v]$, update $d[v] \leftarrow d[u] + w$ and decrease $v$'s key in the priority queue. Record $u$ as $v$'s shortest-path predecessor.
- When the queue empties, every $d[v]$ holds the shortest distance from $s$. Walk predecessors backward from any vertex to reconstruct the path.
With a binary heap the running time is $O((V + E)\log V)$: each vertex is extracted once ($V$ extractions of $\log V$ each) and each edge can trigger at most one decrease-key ($E$ operations of $\log V$ each). With a Fibonacci heap the bound improves to $O(E + V \log V)$, which is the standard "textbook best" — though in practice the binary-heap version is faster on most real graphs.
Dijkstra's greedy choice — "settle the vertex of smallest tentative distance" — assumes that no future relaxation can shorten its distance further. That assumption breaks the moment any edge weight is negative. The algorithm doesn't error out; it silently returns wrong answers. If your graph has even one negative edge, switch to Bellman–Ford.
6. Worked example: Dijkstra on a small graph
Take the directed graph on five vertices $\{A, B, C, D, E\}$ shown below, with $A$ as the source. The edges and weights are: $A \to B$ (4), $A \to C$ (2), $C \to B$ (1), $B \to D$ (3), $C \to D$ (5), $B \to E$ (10), $D \to E$ (4).
Trace the algorithm step by step. The priority queue starts as $\{A: 0,\ B: \infty,\ C: \infty,\ D: \infty,\ E: \infty\}$, and we settle one vertex per iteration:
| Step | Extract | Relax | $d[A]$ | $d[B]$ | $d[C]$ | $d[D]$ | $d[E]$ |
|---|---|---|---|---|---|---|---|
| 0 | — | initialize | 0 | ∞ | ∞ | ∞ | ∞ |
| 1 | $A$ ($d = 0$) | $A{\to}B$: $4$; $A{\to}C$: $2$ | 0 | 4 | 2 | ∞ | ∞ |
| 2 | $C$ ($d = 2$) | $C{\to}B$: $2{+}1 = 3 < 4$; $C{\to}D$: $2{+}5 = 7$ | 0 | 3 | 2 | 7 | ∞ |
| 3 | $B$ ($d = 3$) | $B{\to}D$: $3{+}3 = 6 < 7$; $B{\to}E$: $3{+}10 = 13$ | 0 | 3 | 2 | 6 | 13 |
| 4 | $D$ ($d = 6$) | $D{\to}E$: $6{+}4 = 10 < 13$ | 0 | 3 | 2 | 6 | 10 |
| 5 | $E$ ($d = 10$) | no outgoing edges | 0 | 3 | 2 | 6 | 10 |
Notice the moment in step 2 where the algorithm improves $d[B]$ from $4$ to $3$: after settling $C$, the path $A \to C \to B$ of length $3$ beats the direct edge $A \to B$ of length $4$. Without that relaxation, the final tree would be wrong. The orange edges in the diagram are exactly the relaxations that survived to the end — they form the shortest-path tree rooted at $A$.
When Dijkstra extracts a vertex $u$ of minimum tentative distance, every other unsettled vertex has distance $\geq d[u]$ and every edge has weight $\geq 0$. So any future path to $u$ — which must leave the already-settled set, hop through some unsettled $v$ with $d[v] \geq d[u]$, and then take more non-negative edges — can only be at least as long. There's nothing to gain by waiting; $d[u]$ is final.
7. Bellman–Ford: when edges go negative
The "non-negative edges" caveat is real. Negative weights show up in places you don't expect — currency arbitrage, signed graph problems, reductions from other algorithms — and Dijkstra has no graceful failure mode for them. Bellman–Ford trades speed for generality.
- Initialize $d[s] \leftarrow 0$, $d[v] \leftarrow \infty$ for all other vertices.
- Repeat $V - 1$ times: for every edge $(u, v)$ with weight $w$, relax it (if $d[u] + w < d[v]$, set $d[v] \leftarrow d[u] + w$).
- Do one more pass over the edges. If any relaxation still improves a distance, the graph contains a negative cycle reachable from $s$, and shortest paths are not well-defined.
The running time is $O(VE)$ — slower than Dijkstra by a factor of $V / \log V$ in the worst case — but it works for any edge weights and, as a bonus, detects negative cycles rather than silently looping. The correctness argument is elegant: after $k$ iterations, every $d[v]$ is the length of the shortest path from $s$ to $v$ using at most $k$ edges. A shortest path uses at most $V - 1$ edges (no vertex repeats), so $V - 1$ iterations suffice — unless a negative cycle keeps reducing the distance forever, which the extra pass catches.
8. Minimum spanning trees: Kruskal and Prim
Switch from shortest paths to a different optimization. Given a connected, undirected graph with edge weights, a spanning tree is a subset of edges that touches every vertex and contains no cycles — exactly $V - 1$ edges forming a tree. A minimum spanning tree (MST) is a spanning tree whose total edge weight is minimized.
MSTs are the canonical answer to "connect all these points using the least total cable / road / wire / time." Two classic greedy algorithms find them, both in time roughly $O(E \log V)$, and on a connected graph both return a tree of the same total weight (the MST itself may not be unique, but its weight is).
Kruskal's algorithm
Kruskal builds the MST one edge at a time, in order of weight, skipping any edge that would create a cycle.
- Sort all edges in non-decreasing order of weight.
- Initialize a union-find (disjoint-set) data structure with each vertex in its own component.
- Process edges in sorted order. For each edge $(u, v)$, check whether $u$ and $v$ are in the same component (a find operation). If not, add $(u, v)$ to the MST and merge their components (a union operation). If they are, skip it — adding would create a cycle.
- Stop when $V - 1$ edges have been added (or, equivalently, when the union-find has a single component).
The dominant cost is the sort ($O(E \log E) = O(E \log V)$ since $E \leq V^2$). The union-find operations contribute a near-linear amount, $O(E\, \alpha(V))$, where $\alpha$ is the inverse Ackermann function — effectively a tiny constant.
Prim's algorithm
Prim grows a single tree outward from an arbitrary starting vertex, always taking the cheapest edge that leaves the current tree.
- Pick any vertex $r$ as the seed. Initialize the in-tree set $T = \{r\}$ and put every other vertex $v$ into a min-priority queue keyed by $\text{key}[v]$, the lightest known edge from $T$ to $v$ (initially $\infty$ for all, except neighbors of $r$).
- While the priority queue is non-empty, extract the vertex $u$ with the smallest key. Add the edge that achieved that key to the MST, and add $u$ to $T$.
- For each neighbor $v$ of $u$ that is still outside $T$: if $w(u, v) < \text{key}[v]$, set $\text{key}[v] \leftarrow w(u, v)$ and remember $u$ as $v$'s connector.
- When the queue empties, the recorded connecting edges form the MST.
With a binary heap, Prim runs in $O((V + E)\log V)$ — the same shape of complexity as Dijkstra, and for essentially the same reason: the algorithm extracts each vertex once and may decrease each edge's key at most once.
The choice between them is mostly practical. Kruskal is natural when edges are easy to enumerate and you have a good union-find. Prim is natural when the graph is dense (use an adjacency matrix and skip the heap to get $O(V^2)$) or when you only need part of an MST near a particular seed.
9. The cut property
Both Kruskal and Prim are special cases of one greedy idea. Partition the vertices into two non-empty sets $S$ and $V \setminus S$. The edges with one endpoint in each set are called a cut.
For any cut of the graph, the lightest edge crossing it belongs to some MST. If that edge is uniquely lightest, it belongs to every MST.
The proof is short and worth holding in your head. Suppose the lightest crossing edge $e$ is not in some MST $T$. Adding $e$ to $T$ creates a cycle, and that cycle must contain another edge $e'$ crossing the cut (because the cycle has to come back to $S$). Swap them: $T - e' + e$ is still a spanning tree, and its weight is no larger because $e$ was the lightest crossing edge. So a tree as good as $T$ contains $e$.
Now look back at the two algorithms:
- Kruskal at each step considers the cut that separates a component from everything outside it; the edge it adds is the lightest one crossing that cut.
- Prim at each step considers the cut between $T$ and $V \setminus T$; the edge it adds is the lightest one crossing that cut.
Different cuts, same principle. Different greedy schedules, same MST weight.
10. Topological sort on a DAG
A directed acyclic graph (DAG) is a directed graph with no directed cycle. On a DAG you can always lay the vertices out in a line so that every edge points forward — that's a topological order. Two clean algorithms produce one.
Kahn's algorithm repeatedly peels off a vertex with no remaining incoming edges:
- Compute the in-degree of every vertex. Put every vertex with in-degree $0$ into a queue.
- While the queue is non-empty, dequeue a vertex $v$ and append it to the output list. For each edge $(v, u)$ from $v$, decrement $u$'s in-degree; if it drops to $0$, enqueue $u$.
- If the output list contains all $V$ vertices, it is a topological order. If not, the remaining vertices form at least one cycle and the graph isn't a DAG.
Runs in $O(V + E)$.
The DFS-based version is even shorter to state:
- Run a full DFS on the graph, recording each vertex's finish time.
- Output the vertices in reverse order of finish time.
- If during the DFS you encounter a back-edge (to an ancestor still on the recursion stack), the graph has a cycle and no topological order exists.
Runs in $O(V + E)$. The intuition: a vertex finishes only after all its descendants have finished, so it appears after them in the finish-time order — reversing puts it before them.
Topological sort is the natural language for scheduling under dependencies: build systems, course prerequisites, package installs, dataflow pipelines. Anywhere you have "must happen before" relationships and need a linear order that respects them.
11. Common pitfalls
BFS counts edges, not weights. On a weighted graph it gives the path with the fewest edges, which is not in general the shortest weighted path. Reach for Dijkstra instead.
This is the textbook trap. The algorithm doesn't crash and doesn't warn — it returns wrong distances. Always confirm $w(u, v) \geq 0$ for every edge before running Dijkstra. If you can't, use Bellman–Ford.
Every topological-sort algorithm has a cycle detection step baked in (Kahn: leftover vertices; DFS: back-edges). Use it. Silently returning a partial order on a cyclic graph is a long-debugging-session waiting to happen.
A recursive DFS on a graph that's effectively a long chain — say, a linked-list-shaped DAG with $10^6$ vertices — will blow the call stack in most languages. Switch to an iterative DFS with an explicit stack when depth could be large.
MSTs minimize total tree weight; shortest-path trees minimize distance from a source to each vertex. They're not the same tree in general — even on the same graph. The MST may force a long detour from the source that no shortest-path tree would tolerate.
12. Worked examples
A few small cases to fix the patterns. Work each out before opening the solution.
Example 1 · BFS distances from $A$ in an unweighted graph
Vertices $\{A, B, C, D, E, F\}$; undirected edges $AB$, $AC$, $BD$, $CD$, $DE$, $EF$. Source $A$.
Wave 0: $\{A\}$, distances $d[A] = 0$.
Wave 1: dequeue $A$, enqueue its neighbors $B, C$. $d[B] = d[C] = 1$.
Wave 2: dequeue $B$, find $D$ undiscovered, $d[D] = 2$; dequeue $C$, $D$ now already discovered (no update).
Wave 3: dequeue $D$, enqueue $E$, $d[E] = 3$.
Wave 4: dequeue $E$, enqueue $F$, $d[F] = 4$.
Final distances: $A{:}0,\ B{:}1,\ C{:}1,\ D{:}2,\ E{:}3,\ F{:}4$.
Example 2 · Why Dijkstra fails on a graph with a single negative edge
Take three vertices $A$, $B$, $C$ with directed edges $A \to B$ (weight $1$), $A \to C$ (weight $5$), $B \to C$ (weight $-10$). Source $A$.
Dijkstra extracts $A$, relaxes to $d[B] = 1$ and $d[C] = 5$. It then extracts $B$ (smaller key), relaxes $B \to C$: $d[C]$ becomes $1 + (-10) = -9$. So far so good — but suppose instead the order had been $A \to C$ (weight $1$), $A \to B$ (weight $5$), $B \to C$ (weight $-10$). Now Dijkstra extracts $C$ at $d[C] = 1$ and settles it — only later does it discover $B \to C$ would have given $5 - 10 = -5$. The final $d[C]$ is wrong: Dijkstra reports $1$ when the true shortest distance is $-5$.
The general moral: once Dijkstra settles a vertex, it never revisits it. A later negative edge can't repair the mistake.
Example 3 · Kruskal on a 5-vertex graph
Vertices $\{A, B, C, D, E\}$; edges $AB(1)$, $AC(3)$, $BC(2)$, $BD(4)$, $CD(5)$, $CE(6)$, $DE(7)$.
Sort by weight: $AB(1), BC(2), AC(3), BD(4), CD(5), CE(6), DE(7)$.
- $AB$: $A$ and $B$ disjoint → add. Components: $\{AB\}, \{C\}, \{D\}, \{E\}$.
- $BC$: $B \in \{AB\}$, $C$ alone → add. Components: $\{ABC\}, \{D\}, \{E\}$.
- $AC$: both in $\{ABC\}$ → skip (would create a cycle).
- $BD$: $B \in \{ABC\}$, $D$ alone → add. Components: $\{ABCD\}, \{E\}$.
- $CD$: both in $\{ABCD\}$ → skip.
- $CE$: $C \in \{ABCD\}$, $E$ alone → add. Components: $\{ABCDE\}$.
MST edges: $\{AB, BC, BD, CE\}$, total weight $1 + 2 + 4 + 6 = 13$.
Example 4 · Topological order of a small DAG
Directed edges on $\{1, 2, 3, 4, 5, 6\}$: $1\to 2$, $1\to 3$, $2\to 4$, $3\to 4$, $4\to 5$, $4\to 6$.
In-degrees: $1{:}0,\ 2{:}1,\ 3{:}1,\ 4{:}2,\ 5{:}1,\ 6{:}1$. Start with $\{1\}$ in Kahn's queue.
Dequeue $1$, output it; decrement $2$ and $3$ to in-degree $0$, enqueue both. Dequeue $2$, output it; decrement $4$ to $1$. Dequeue $3$, output it; decrement $4$ to $0$, enqueue. Dequeue $4$, output it; decrement $5$ and $6$ to $0$, enqueue both. Dequeue $5$ then $6$ (or in the opposite order — both are valid), output them.
One valid topological order: $1, 2, 3, 4, 5, 6$. The order $1, 3, 2, 4, 6, 5$ is equally valid; topological orders are not in general unique.