Graph Algorithms — Discrete Math & Combinatorics

What you'll leave with

How BFS and DFS each traverse a graph in $O(V + E)$ time, and what each one is naturally good at.
Dijkstra's algorithm for single-source shortest paths in non-negatively-weighted graphs, walked through on a small example.
Why Dijkstra fails on negative edges, and what Bellman–Ford gives you in exchange for slower running time.
Two ways to build a minimum spanning tree — Kruskal and Prim — and the cut property that makes both correct.
Topological sort on a DAG, by either Kahn's BFS variant or DFS post-order reversal.

1. The setup: graphs, weights, conventions

A graph $G = (V, E)$ is a set of vertices $V$ and a set of edges $E \subseteq V \times V$. We write $n = |V|$ for the number of vertices and $m = |E|$ for the number of edges. Edges may be directed (an arrow from $u$ to $v$) or undirected (an unordered pair), and they may carry a weight $w(u, v) \in \mathbb{R}$. In running times the standard shorthand is $O(V + E)$ rather than $O(n + m)$ — read $V$ and $E$ as their sizes when they appear in a complexity expression.

Three quick conventions used throughout:

Distance to a vertex is written $d[v]$, initialized to $\infty$ for all vertices except the source $s$, where $d[s] = 0$.
A vertex's neighbors are $N(v) = \{ u : (v, u) \in E \}$.
A relaxation of edge $(u, v)$ with weight $w$ means: if $d[u] + w < d[v]$, update $d[v] \leftarrow d[u] + w$. Almost every shortest-path algorithm is a particular discipline of edge relaxations.

2. Breadth-first search

BFS explores a graph in waves. Starting at a source $s$, it visits everything at distance $1$, then everything at distance $2$, and so on. The data structure that makes this work is a queue: first in, first out, so that a vertex you enqueue when expanding the wave at depth $k$ is dequeued only after every vertex at depth $k$ already in the queue.

BFS from source $s$

Initialize $d[s] \leftarrow 0$ and $d[v] \leftarrow \infty$ for every other vertex; create an empty queue $Q$ and enqueue $s$.
While $Q$ is non-empty, dequeue a vertex $v$.
For each neighbor $u \in N(v)$ with $d[u] = \infty$ (i.e. not yet discovered): set $d[u] \leftarrow d[v] + 1$, record $v$ as $u$'s parent, and enqueue $u$.
When the queue empties, $d[v]$ holds the shortest unweighted distance from $s$ for every reachable $v$.

Every vertex is enqueued at most once and every edge is inspected at most twice (once from each endpoint, in an undirected graph), giving the classical $O(V + E)$ bound. The parent pointers, taken together, form the BFS tree — a spanning tree of the connected component containing $s$ in which every root-to-leaf path is a shortest path from $s$.

BFS is the right hammer when:

You need shortest paths in an unweighted graph — or equivalently, in a graph where every edge has the same weight.
You want the level structure of a graph: which vertices are exactly $k$ steps from $s$.
You're checking bipartiteness by 2-colouring the BFS tree.
You're computing connected components in an undirected graph (run BFS from each undiscovered vertex).

3. Depth-first search

DFS does the opposite: it goes deep before it goes wide. From the current vertex, it picks an unvisited neighbor, walks there, and recurses; only when a vertex has no unvisited neighbors does it back up and try the next branch. Either an explicit stack or — more commonly — the call stack of a recursive routine serves as the data structure.

DFS from vertex $v$

Mark $v$ as discovered and record its discovery time.
For each neighbor $u \in N(v)$ that is not yet discovered, recursively DFS from $u$ (recording $v$ as its DFS parent).
When all neighbors have been processed, mark $v$ as finished and record its finish time.

Run from a source until done; if vertices remain undiscovered, repeat from any of them. The total work is again $O(V + E)$, since every vertex is entered once and every edge is examined a constant number of times. The discovery and finish times — the pair $(d[v], f[v])$ — turn out to encode an astonishing amount of structure, which is why DFS shows up in so many algorithms.

DFS is the right tool when:

Cycle detection: in an undirected graph, any edge from $v$ to an already-discovered, non-parent vertex closes a cycle. In a directed graph, a back-edge (to an ancestor still on the recursion stack) is the cycle witness.
Topological sort on a DAG: reverse the order of finish times.
Strongly connected components: Tarjan's and Kosaraju's algorithms both run two DFS passes and report SCCs in $O(V + E)$.
Bridges and articulation points: Tarjan's DFS with low-link values finds them in one pass.

Note

The DFS tree, edges-not-in-the-tree, and the discovery/finish times together let you classify every edge of a directed graph as tree, forward, back, or cross. That classification is what makes DFS so much more than "BFS but stack instead of queue."

4. BFS vs DFS at a glance

Breadth-first

BFS — explore in layers

Queue. Wave-by-wave from source. $O(V + E)$ time, $O(V)$ space.

Shortest paths in unweighted graphs
Level sets / "everything within $k$ steps"
Bipartiteness check
Connected components (undirected)

Depth-first

DFS — explore by diving

Stack or recursion. Deep first, backtrack. $O(V + E)$ time, $O(V)$ space.

Cycle detection
Topological sort
Strongly connected components
Bridges and articulation points

5. Dijkstra's algorithm

When edges carry non-negative weights, BFS is no longer enough — a short hop of weight $10$ may still be longer than a roundabout path of three weight-$1$ edges. Dijkstra's algorithm generalizes the BFS idea: instead of expanding by integer waves, it expands greedily from the unvisited vertex of smallest tentative distance.

Dijkstra's algorithm

Given a graph with non-negative edge weights and a source $s$, compute $d[v]$, the length of a shortest path from $s$ to every vertex $v$, by repeatedly extracting the unsettled vertex of minimum tentative distance and relaxing its outgoing edges.

Dijkstra from source $s$

Set $d[s] \leftarrow 0$ and $d[v] \leftarrow \infty$ for every other vertex. Put every vertex into a min-priority queue keyed by $d$.
While the priority queue is non-empty, extract the vertex $u$ with the smallest $d[u]$ and mark it settled. Its $d[u]$ is now the final shortest-path distance.
For each edge $(u, v)$ with weight $w$ where $v$ is still unsettled, relax: if $d[u] + w < d[v]$, update $d[v] \leftarrow d[u] + w$ and decrease $v$'s key in the priority queue. Record $u$ as $v$'s shortest-path predecessor.
When the queue empties, every $d[v]$ holds the shortest distance from $s$. Walk predecessors backward from any vertex to reconstruct the path.

With a binary heap the running time is $O((V + E)\log V)$: each vertex is extracted once ($V$ extractions of $\log V$ each) and each edge can trigger at most one decrease-key ($E$ operations of $\log V$ each). With a Fibonacci heap the bound improves to $O(E + V \log V)$, which is the standard "textbook best" — though in practice the binary-heap version is faster on most real graphs.

Pitfall

Dijkstra's greedy choice — "settle the vertex of smallest tentative distance" — assumes that no future relaxation can shorten its distance further. That assumption breaks the moment any edge weight is negative. The algorithm doesn't error out; it silently returns wrong answers. If your graph has even one negative edge, switch to Bellman–Ford.

6. Worked example: Dijkstra on a small graph

Take the directed graph on five vertices $\{A, B, C, D, E\}$ shown below, with $A$ as the source. The edges and weights are: $A \to B$ (4), $A \to C$ (2), $C \to B$ (1), $B \to D$ (3), $C \to D$ (5), $B \to E$ (10), $D \to E$ (4).

Final state: each vertex labelled with its shortest distance from $A$; orange edges form the shortest-path tree.

Trace the algorithm step by step. The priority queue starts as $\{A: 0,\ B: \infty,\ C: \infty,\ D: \infty,\ E: \infty\}$, and we settle one vertex per iteration:

Step	Extract	Relax	$d[B]$	$d[C]$	$d[D]$	$d[E]$
0	—	initialize	∞	∞	∞	∞
1	$A$ ($d = 0$)	$A{\to}B$: $4$; $A{\to}C$: $2$	4	2	∞	∞
2	$C$ ($d = 2$)	$C{\to}B$: $2{+}1 = 3 < 4$; $C{\to}D$: $2{+}5 = 7$	3	2	7	∞
3	$B$ ($d = 3$)	$B{\to}D$: $3{+}3 = 6 < 7$; $B{\to}E$: $3{+}10 = 13$	3	2	6	13
4	$D$ ($d = 6$)	$D{\to}E$: $6{+}4 = 10 < 13$	3	2	6	10
5	$E$ ($d = 10$)	no outgoing edges	3	2	6	10

Notice the moment in step 2 where the algorithm improves $d[B]$ from $4$ to $3$: after settling $C$, the path $A \to C \to B$ of length $3$ beats the direct edge $A \to B$ of length $4$. Without that relaxation, the final tree would be wrong. The orange edges in the diagram are exactly the relaxations that survived to the end — they form the shortest-path tree rooted at $A$.

Why greedy works

When Dijkstra extracts a vertex $u$ of minimum tentative distance, every other unsettled vertex has distance $\geq d[u]$ and every edge has weight $\geq 0$. So any future path to $u$ — which must leave the already-settled set, hop through some unsettled $v$ with $d[v] \geq d[u]$, and then take more non-negative edges — can only be at least as long. There's nothing to gain by waiting; $d[u]$ is final.

7. Bellman–Ford: when edges go negative

The "non-negative edges" caveat is real. Negative weights show up in places you don't expect — currency arbitrage, signed graph problems, reductions from other algorithms — and Dijkstra has no graceful failure mode for them. Bellman–Ford trades speed for generality.

Bellman–Ford from source $s$

Initialize $d[s] \leftarrow 0$, $d[v] \leftarrow \infty$ for all other vertices.
Repeat $V - 1$ times: for every edge $(u, v)$ with weight $w$, relax it (if $d[u] + w < d[v]$, set $d[v] \leftarrow d[u] + w$).
Do one more pass over the edges. If any relaxation still improves a distance, the graph contains a negative cycle reachable from $s$, and shortest paths are not well-defined.

The running time is $O(VE)$ — slower than Dijkstra by a factor of $V / \log V$ in the worst case — but it works for any edge weights and, as a bonus, detects negative cycles rather than silently looping. The correctness argument is elegant: after $k$ iterations, every $d[v]$ is the length of the shortest path from $s$ to $v$ using at most $k$ edges. A shortest path uses at most $V - 1$ edges (no vertex repeats), so $V - 1$ iterations suffice — unless a negative cycle keeps reducing the distance forever, which the extra pass catches.

8. Minimum spanning trees: Kruskal and Prim

Switch from shortest paths to a different optimization. Given a connected, undirected graph with edge weights, a spanning tree is a subset of edges that touches every vertex and contains no cycles — exactly $V - 1$ edges forming a tree. A minimum spanning tree (MST) is a spanning tree whose total edge weight is minimized.

MSTs are the canonical answer to "connect all these points using the least total cable / road / wire / time." Two classic greedy algorithms find them, both in time roughly $O(E \log V)$, and on a connected graph both return a tree of the same total weight (the MST itself may not be unique, but its weight is).

Kruskal's algorithm

Kruskal builds the MST one edge at a time, in order of weight, skipping any edge that would create a cycle.

Kruskal's algorithm

Sort all edges in non-decreasing order of weight.
Initialize a union-find (disjoint-set) data structure with each vertex in its own component.
Process edges in sorted order. For each edge $(u, v)$, check whether $u$ and $v$ are in the same component (a find operation). If not, add $(u, v)$ to the MST and merge their components (a union operation). If they are, skip it — adding would create a cycle.
Stop when $V - 1$ edges have been added (or, equivalently, when the union-find has a single component).

The dominant cost is the sort ($O(E \log E) = O(E \log V)$ since $E \leq V^2$). The union-find operations contribute a near-linear amount, $O(E\, \alpha(V))$, where $\alpha$ is the inverse Ackermann function — effectively a tiny constant.

Prim's algorithm

Prim grows a single tree outward from an arbitrary starting vertex, always taking the cheapest edge that leaves the current tree.

Prim's algorithm

Pick any vertex $r$ as the seed. Initialize the in-tree set $T = \{r\}$ and put every other vertex $v$ into a min-priority queue keyed by $\text{key}[v]$, the lightest known edge from $T$ to $v$ (initially $\infty$ for all, except neighbors of $r$).
While the priority queue is non-empty, extract the vertex $u$ with the smallest key. Add the edge that achieved that key to the MST, and add $u$ to $T$.
For each neighbor $v$ of $u$ that is still outside $T$: if $w(u, v) < \text{key}[v]$, set $\text{key}[v] \leftarrow w(u, v)$ and remember $u$ as $v$'s connector.
When the queue empties, the recorded connecting edges form the MST.

With a binary heap, Prim runs in $O((V + E)\log V)$ — the same shape of complexity as Dijkstra, and for essentially the same reason: the algorithm extracts each vertex once and may decrease each edge's key at most once.

The choice between them is mostly practical. Kruskal is natural when edges are easy to enumerate and you have a good union-find. Prim is natural when the graph is dense (use an adjacency matrix and skip the heap to get $O(V^2)$) or when you only need part of an MST near a particular seed.

9. The cut property

Both Kruskal and Prim are special cases of one greedy idea. Partition the vertices into two non-empty sets $S$ and $V \setminus S$. The edges with one endpoint in each set are called a cut.

Cut property

For any cut of the graph, the lightest edge crossing it belongs to some MST. If that edge is uniquely lightest, it belongs to every MST.

The proof is short and worth holding in your head. Suppose the lightest crossing edge $e$ is not in some MST $T$. Adding $e$ to $T$ creates a cycle, and that cycle must contain another edge $e'$ crossing the cut (because the cycle has to come back to $S$). Swap them: $T - e' + e$ is still a spanning tree, and its weight is no larger because $e$ was the lightest crossing edge. So a tree as good as $T$ contains $e$.

Now look back at the two algorithms:

Kruskal at each step considers the cut that separates a component from everything outside it; the edge it adds is the lightest one crossing that cut.
Prim at each step considers the cut between $T$ and $V \setminus T$; the edge it adds is the lightest one crossing that cut.

Different cuts, same principle. Different greedy schedules, same MST weight.

10. Topological sort on a DAG

A directed acyclic graph (DAG) is a directed graph with no directed cycle. On a DAG you can always lay the vertices out in a line so that every edge points forward — that's a topological order. Two clean algorithms produce one.

Kahn's algorithm repeatedly peels off a vertex with no remaining incoming edges:

Compute the in-degree of every vertex. Put every vertex with in-degree $0$ into a queue.
While the queue is non-empty, dequeue a vertex $v$ and append it to the output list. For each edge $(v, u)$ from $v$, decrement $u$'s in-degree; if it drops to $0$, enqueue $u$.
If the output list contains all $V$ vertices, it is a topological order. If not, the remaining vertices form at least one cycle and the graph isn't a DAG.

Runs in $O(V + E)$.

The DFS-based version is even shorter to state:

Run a full DFS on the graph, recording each vertex's finish time.
Output the vertices in reverse order of finish time.
If during the DFS you encounter a back-edge (to an ancestor still on the recursion stack), the graph has a cycle and no topological order exists.

Runs in $O(V + E)$. The intuition: a vertex finishes only after all its descendants have finished, so it appears after them in the finish-time order — reversing puts it before them.

Topological sort is the natural language for scheduling under dependencies: build systems, course prerequisites, package installs, dataflow pipelines. Anywhere you have "must happen before" relationships and need a linear order that respects them.

11. Common pitfalls

BFS on a weighted graph

BFS counts edges, not weights. On a weighted graph it gives the path with the fewest edges, which is not in general the shortest weighted path. Reach for Dijkstra instead.

Dijkstra with negative edges

This is the textbook trap. The algorithm doesn't crash and doesn't warn — it returns wrong distances. Always confirm $w(u, v) \geq 0$ for every edge before running Dijkstra. If you can't, use Bellman–Ford.

Topological sort on a cyclic graph

Every topological-sort algorithm has a cycle detection step baked in (Kahn: leftover vertices; DFS: back-edges). Use it. Silently returning a partial order on a cyclic graph is a long-debugging-session waiting to happen.

Recursive DFS on deep graphs

A recursive DFS on a graph that's effectively a long chain — say, a linked-list-shaped DAG with $10^6$ vertices — will blow the call stack in most languages. Switch to an iterative DFS with an explicit stack when depth could be large.

Forgetting which problem you're solving

MSTs minimize total tree weight; shortest-path trees minimize distance from a source to each vertex. They're not the same tree in general — even on the same graph. The MST may force a long detour from the source that no shortest-path tree would tolerate.

12. Worked examples

A few small cases to fix the patterns. Work each out before opening the solution.

Example 1 · BFS distances from $A$ in an unweighted graph

Vertices $\{A, B, C, D, E, F\}$; undirected edges $AB$, $AC$, $BD$, $CD$, $DE$, $EF$. Source $A$.

Wave 0: $\{A\}$, distances $d[A] = 0$.

Wave 1: dequeue $A$, enqueue its neighbors $B, C$. $d[B] = d[C] = 1$.

Wave 2: dequeue $B$, find $D$ undiscovered, $d[D] = 2$; dequeue $C$, $D$ now already discovered (no update).

Wave 3: dequeue $D$, enqueue $E$, $d[E] = 3$.

Wave 4: dequeue $E$, enqueue $F$, $d[F] = 4$.

Final distances: $A{:}0,\ B{:}1,\ C{:}1,\ D{:}2,\ E{:}3,\ F{:}4$.

Example 2 · Why Dijkstra fails on a graph with a single negative edge

Take three vertices $A$, $B$, $C$ with directed edges $A \to B$ (weight $1$), $A \to C$ (weight $5$), $B \to C$ (weight $-10$). Source $A$.

Dijkstra extracts $A$, relaxes to $d[B] = 1$ and $d[C] = 5$. It then extracts $B$ (smaller key), relaxes $B \to C$: $d[C]$ becomes $1 + (-10) = -9$. So far so good — but suppose instead the order had been $A \to C$ (weight $1$), $A \to B$ (weight $5$), $B \to C$ (weight $-10$). Now Dijkstra extracts $C$ at $d[C] = 1$ and settles it — only later does it discover $B \to C$ would have given $5 - 10 = -5$. The final $d[C]$ is wrong: Dijkstra reports $1$ when the true shortest distance is $-5$.

The general moral: once Dijkstra settles a vertex, it never revisits it. A later negative edge can't repair the mistake.

Example 3 · Kruskal on a 5-vertex graph

Vertices $\{A, B, C, D, E\}$; edges $AB(1)$, $AC(3)$, $BC(2)$, $BD(4)$, $CD(5)$, $CE(6)$, $DE(7)$.

Sort by weight: $AB(1), BC(2), AC(3), BD(4), CD(5), CE(6), DE(7)$.

$AB$: $A$ and $B$ disjoint → add. Components: $\{AB\}, \{C\}, \{D\}, \{E\}$.
$BC$: $B \in \{AB\}$, $C$ alone → add. Components: $\{ABC\}, \{D\}, \{E\}$.
$AC$: both in $\{ABC\}$ → skip (would create a cycle).
$BD$: $B \in \{ABC\}$, $D$ alone → add. Components: $\{ABCD\}, \{E\}$.
$CD$: both in $\{ABCD\}$ → skip.
$CE$: $C \in \{ABCD\}$, $E$ alone → add. Components: $\{ABCDE\}$.

MST edges: $\{AB, BC, BD, CE\}$, total weight $1 + 2 + 4 + 6 = 13$.

Example 4 · Topological order of a small DAG

Directed edges on $\{1, 2, 3, 4, 5, 6\}$: $1\to 2$, $1\to 3$, $2\to 4$, $3\to 4$, $4\to 5$, $4\to 6$.

In-degrees: $1{:}0,\ 2{:}1,\ 3{:}1,\ 4{:}2,\ 5{:}1,\ 6{:}1$. Start with $\{1\}$ in Kahn's queue.

Dequeue $1$, output it; decrement $2$ and $3$ to in-degree $0$, enqueue both. Dequeue $2$, output it; decrement $4$ to $1$. Dequeue $3$, output it; decrement $4$ to $0$, enqueue. Dequeue $4$, output it; decrement $5$ and $6$ to $0$, enqueue both. Dequeue $5$ then $6$ (or in the opposite order — both are valid), output them.

One valid topological order: $1, 2, 3, 4, 5, 6$. The order $1, 3, 2, 4, 6, 5$ is equally valid; topological orders are not in general unique.

Sources & further reading

The algorithms here are old and well-trodden. The references below are the standard places to deepen and formalize what's here, in roughly increasing order of depth.

Algorithms, 4th Edition — Graphs Textbook Sedgewick & Wayne · Princeton

Free online book chapters with crisp explanations of BFS, DFS, MST (Kruskal and Prim), and shortest paths (Dijkstra and Bellman–Ford). Great companion when you want to see the algorithms run on visual examples.
6.006 Introduction to Algorithms Course MIT OpenCourseWare · Spring 2020

Full undergraduate course with video lectures, problem sets, and recitations. Lectures 11–18 cover BFS, DFS, Dijkstra, Bellman–Ford, and MSTs in depth.
Introduction to Algorithms (CLRS), 4th edition Reference Cormen, Leiserson, Rivest, Stein · MIT Press

The canonical algorithms reference. Chapters 20–22 cover elementary graph algorithms (BFS/DFS, topological sort, SCC); chapter 23 covers MSTs; chapters 24–25 cover shortest paths. Reach for it when you want proofs of correctness and tight running-time analyses.
Dijkstra's Algorithm Reference Wolfram MathWorld

Compact formal statement and pointers into the broader shortest-path literature. Useful when you want the definition stated in the language of professional algorithm analysis.
Kruskal's algorithm Encyclopedia Wikipedia

Broad overview of Kruskal with the cut-property correctness argument spelled out and links to Prim, Borůvka, and the wider MST literature. Good entry point for the surrounding landscape.
Breadth-first search Encyclopedia Wikipedia

Historical context (Moore 1959, Lee 1961), pseudocode, and a survey of applications from web crawling to peer-to-peer networks. Useful for placing BFS in the wider algorithmic ecosystem.

What you'll leave with

1. The setup: graphs, weights, conventions

2. Breadth-first search

3. Depth-first search

4. BFS vs DFS at a glance

BFS — explore in layers

DFS — explore by diving

5. Dijkstra's algorithm

6. Worked example: Dijkstra on a small graph

7. Bellman–Ford: when edges go negative

8. Minimum spanning trees: Kruskal and Prim

Kruskal's algorithm

Prim's algorithm

9. The cut property

10. Topological sort on a DAG

11. Common pitfalls

12. Worked examples

Sources & further reading

Test your understanding