Algorithms 2

Dr John Fawcett · Cambridge CST Part IA · Lent 2025/26 · CLRS chapters 22–26 + amortized analysis + binomial / Fibonacci heaps (CLRS2 ch.19–20)

Graph Definitions & Representation

A graph G = (V, E) is a set of vertices and a set of edges E ⊆ V × V. In a directed graph edges are ordered pairs (u, v); in an undirected graph E is symmetric. A weighted graph has a weight function w : E → ℝ. A graph is fully connected if E = V × V.

Special Graph Types

DAG: Directed Acyclic Graph — no directed cycles. Admits a topological ordering.
Tree: Connected undirected acyclic graph. Has exactly |V|−1 edges.
Forest: Undirected acyclic graph (disjoint union of trees).
Connected: Undirected graph where every pair of vertices has a path between them.

Representations

Representation	Storage	Edge lookup	Iterate neighbours	Best for
Adjacency list	Θ(V + E)	O(deg(v))	O(deg(v))	Sparse graphs
Adjacency matrix	Θ(V²)	O(1)	O(V)	Dense graphs, quick edge queries

For an undirected graph with an adjacency matrix, only the upper (or lower) triangle needs to be stored because the matrix is symmetric. A weighted adjacency matrix stores the weight in each cell (or ∞ for missing edges).

Breadth-First Search (SSAD_HOPCOUNT)

Problem: single-source all-destinations unweighted shortest paths — find v.d = hop-distance δ(s, v) for every vertex.

Course notation: v.d for distance, v.π for predecessor, v.pending for "currently in queue". (Used interchangeably below with seen/come_from.)

def bfs(g, s):
    for v in g.vertices:
        v.d = ∞; v.π = NIL; v.pending = False
    s.d = 0; s.pending = True
    Q = Queue([s])
    while not Q.isempty():
        u = Q.popleft()                          # FIFO
        for v in u.neighbours:
            if not v.pending:
                v.d = u.d + 1;  v.π = u;  v.pending = True
                Q.pushright(v)

Running time: O(V + E) with adjacency lists. Each vertex is enqueued/dequeued at most once (O(V)); across all iterations the inner for-loop touches each edge at most once (O(E)). With an adjacency matrix the inner loop is O(V) per dequeue, giving O(V²).

Correctness — three lemmas

Lemma 1 (edge inequality) If (u, v) ∈ E then δ(s, v) ≤ δ(s, u) + 1.

Lemma 2 (no underestimate) On termination, v.d ≥ δ(s, v) for all v ∈ V. Proof by induction on the number of ENQUEUE operations: a newly enqueued v has v.d = u.d + 1 ≥ δ(s, u) + 1 ≥ δ(s, v) by Lemma 1. □

Lemma 3 (queue invariant ϕ) If Q = v₁, …, v_x then v_x.d ≤ v₁.d + 1 and v_i.d ≤ v_i+1.d. Corollary: if v_a is enqueued before v_b then v_a.d ≤ v_b.d on termination.

Theorem — BFS is correct Suppose not: let v be the reachable vertex of minimum δ(s, v) with v.d > δ(s, v). Let u be v's predecessor on a shortest path, so δ(s, u) < δ(s, v) and by minimality u.d = δ(s, u). Now consider v's state when u is dequeued:

Not pending: line processing u sets v.d = u.d + 1, contradicting v.d > u.d + 1.
Pending (in queue): some earlier w enqueued v, so v.d = w.d + 1. By the Lemma 3 corollary w.d ≤ u.d, so v.d ≤ u.d + 1 — contradiction.
Already dequeued: v.d must equal some earlier w.d + 1 with w.d ≤ u.d — contradiction.

So no such v exists. □

Depth-First Search

Recursive Implementation

def dfs(g):
    for v in g.vertices: v.visited = False; v.π = NIL
    time = 0
    for v in g.vertices:
        if not v.visited: dfs_helper(v)

def dfs_helper(u):
    time += 1; u.discover_time = time; u.visited = True
    for v in u.neighbours:
        if not v.visited:
            v.π = u; dfs_helper(v)
    time += 1; u.finish_time = time

Stack-Based Equivalent

def dfs_iter(g, s):
    for v in g.vertices: v.seen = False
    stack = Stack([s]);  s.seen = True
    while not stack.isempty():
        u = stack.popright()
        for v in u.neighbours:
            if not v.seen: stack.pushright(v);  v.seen = True

Running time: Θ(V + E). Each vertex's dfs_helper is called once; the for-loop over a vertex's neighbours is O(deg(v)) per call, summing to O(E). Uses aggregate analysis: we bound total operations, not per-call cost.

Each vertex has a discover_time (when it is first entered — "grey") and a finish_time (when its recursion returns — "black"). Times are integers 1…2|V|.

Edge Classification & DFS Properties

DFS on G produces a depth-first forest. Every edge (u, v) ∈ E falls into exactly one of four classes:

Tree edge: (u, v) was used to discover v (so v.π = u).
Back edge: v is an ancestor of u in the DF-forest. (Undirected graphs: only tree + back edges exist.)
Forward edge: Non-tree edge to a descendant of u.
Cross edge: Any edge between vertices with no ancestor relation (possibly across DF-trees — only possible in directed graphs).

Parenthesis characterisation. Using discover/finish intervals [u.d, u.f]:

Tree or forward edge (u, v): u.d < v.d < v.f < u.f (v's interval nested inside u's).
Back edge (u, v): v.d ≤ u.d < u.f ≤ v.f (u's interval nested inside v's).
Cross edge (u, v): v.d < v.f < u.d < u.f (intervals disjoint; v finishes before u starts).

Useful Consequences

An undirected graph has a cycle ⟺ DFS finds a back edge.
On an undirected graph, DFS identifies connected components: one per top-level call to dfs_helper.
On a directed acyclic graph, sorting vertices by descending finish time gives a valid topological ordering.

Strongly Connected Components

A strongly connected component (SCC) of a directed graph G is a maximal C ⊆ V such that every pair u, v ∈ C is mutually reachable.

STRONGLY-CONNECTED-COMPONENTS(G)

Run DFS on G to populate finish_time for each v ∈ V.
Compute the transpose G^T (reverse every edge).
Run DFS on G^T but process vertices in the outer loop in order of descending finish_time from step 1.
Each tree in the resulting DF-forest is an SCC.

Runs in Θ(V + E). Proof of correctness is in CLRS ch.22. (Not proved in lectures.)

Bellman-Ford Algorithm

Problem: Single-source shortest paths in a directed graph with arbitrary edge weights. Returns true + valid distances if no negative-weight cycle is reachable from s, else returns false.

Core subroutine — RELAX(u, v, w):

def RELAX(u, v, w):
    if v.d > u.d + w(u, v):
        v.d = u.d + w(u, v);  v.π = u

def BELLMAN-FORD(G, w, s):
    for v in G.V: v.d = ∞;  v.π = NIL
    s.d = 0
    for i = 1 to |V|−1:
        for (u, v) in G.E:
            RELAX(u, v, w)
    for (u, v) in G.E:
        if v.d > u.d + w(u, v): return False   # negative cycle
    return True

Running time: O(V · E). V−1 relaxation passes, each scanning all E edges, plus one negative-cycle check pass.

Correctness Sketch

Case (i) — no negative cycle reachable from s Any shortest path has ≤ V−1 edges (otherwise it contains a cycle of non-negative weight that can be cut out). After iteration k, all shortest paths using ≤ k edges are correct (induction on k, using the convergence property of RELAX: if s ⤳ u → v is a shortest path and u.d = δ(s, u) before RELAX(u, v), then v.d = δ(s, v) after). After V−1 iterations everything is correct, so the negative-cycle test never fires. □

Case (ii) — negative cycle reachable from s If the algorithm returned True then summing v.d ≤ u.d + w(u, v) around the cycle yields 0 ≤ (cycle weight), contradicting negativity. So it must return False. □

Dijkstra vs Bellman-Ford

	Dijkstra	Bellman-Ford
Edge weights	Non-negative only	Any (detects negative cycle)
Running time	O(E + V log V) Fib-heap	O(V E)
Strategy	Clever order, relax once	Any order, relax V−1 times

Single-Source Shortest Paths on a DAG

If G is a DAG, SSSP is solvable in Θ(V + E) — faster than both Bellman-Ford and Dijkstra, and handles arbitrary edge weights.

def DAG-SSSP(G, w, s):
    for v in G.V: v.d = ∞;  v.π = NIL
    s.d = 0
    order = TOPOLOGICAL-SORT(G)              # Θ(V + E)
    for u in order:
        for v in G.adj[u]:
            RELAX(u, v, w)                   # Θ(E) total

Because vertices are relaxed in topological order, any shortest path s ⤳ v is relaxed edge-by-edge in the correct sequence — so when we reach v, v.d is final.

Dijkstra's Algorithm

Problem: Single-source shortest paths in a directed graph with non-negative edge weights. Greedy algorithm exploiting the optimal-substructure property of shortest paths.

def DIJKSTRA(G, w, s):
    for v in G.V: v.d = ∞;  v.π = NIL
    s.d = 0
    S = ∅                                    # processed set (proof bookkeeping)
    Q = new PriorityQueue(G.V, key=v.d)
    while not Q.isempty():
        u = Q.extract-min()
        S = S ∪ {u}
        for v in G.adj[u]:
            RELAX(u, v, w)                   # triggers DECREASE-KEY when v.d drops

Running Time — dominated by priority queue

PQ Implementation	INSERT	EXTRACT-MIN	DECREASE-KEY	Total
Array / hash table	O(1)	O(V)	O(1)	O(V² + E)
Binary min-heap	O(lg V)	O(lg V)	O(lg V)	O((V + E) lg V)
Fibonacci heap	O(1) am.	O(lg V) am.	O(1) am.	O(E + V lg V)

Dijkstra requires w(u, v) ≥ 0 for all edges. With a negative edge, a later-popped vertex could provide a shorter path to a vertex already in S, violating the invariant.

Dijkstra Correctness

Theorem — loop invariant ϕ At the start of each iteration of the while loop, v.d = δ(s, v) for all v ∈ S. Equivalent statement: when u is extracted from Q, u.d = δ(s, u).

Initialisation. S = ∅, vacuously true. Maintenance (proof by contradiction):

Let u be the first vertex popped with u.d ≠ δ(s, u). Since s.d = 0 = δ(s, s), we have u ≠ s, so some shortest path s ⤳ u exists. Split it at the first vertex y ∉ S: p = s ⤳ x → y ⤳ u, where x ∈ S. By minimality of u, x.d = δ(s, x). Edge (x, y) was relaxed when x entered S, so by the convergence property y.d = δ(s, y).

Since weights are non-negative, δ(s, y) ≤ δ(s, u) ≤ u.d (the second ≤ is Lemma 2 applied to u). And because Q gave us u, not y, we have u.d ≤ y.d. Chain: y.d = δ(s, y) ≤ δ(s, u) ≤ u.d ≤ y.d. Everything collapses to equality, so u.d = δ(s, u). Contradiction. □

Termination. Q is empty ⟹ S = V ⟹ the invariant gives v.d = δ(s, v) for every vertex.

The step δ(s, y) ≤ δ(s, u) is where non-negativity is essential. With a negative edge between y and u, a longer path could be cheaper.

Floyd-Warshall — All-Pairs Shortest Paths via DP

Problem: All-Pairs Shortest Paths on an arbitrarily weighted directed graph with no negative cycles. Output: the |V| × |V| matrix D where D_ij = δ(i, j).

Optimal Substructure

For i, j ∈ V, consider a minimum-weight path p from i to j whose intermediate vertices lie in {1, 2, …, k}. Two cases:

k is not an intermediate in p: p is also minimum-weight using intermediates in {1, …, k−1}.
k is an intermediate: p decomposes as i ⤳_p₁ k ⤳_p₂ j, where p₁ and p₂ both have intermediates in {1, …, k−1} (each being a shortest path between its endpoints under that restriction).

Recurrence

Bellman-style DP recurrence d_ij^(k) = min( d_ij^(k−1), d_ik^(k−1) + d_kj^(k−1) )

Base case: D⁽⁰⁾ = W (edge weight matrix; 0 on diagonal, ∞ where no edge).

def FLOYD-WARSHALL(G, w):
    D⁰ = w                                  # copy adjacency-weight matrix
    for k = 1 to |V|:
        let Dᵏ = new matrix
        for i = 1 to |V|:
            for j = 1 to |V|:
                Dᵏ[i][j] = min(D^k-1[i][j],  D^k-1[i][k] + D^k-1[k][j])
    return D^|V|

Running time: Θ(V³) — three nested loops, each over |V|. Space Θ(V²) (only need the previous matrix; the superscripts can be dropped in-place with care).

Extensions

Predecessor matrix Π. Maintain in parallel π_ij^(k) = predecessor of j on a minimum i ⤳ j path using {1, …, k}. When the min picks option 1, keep π_ij^(k−1); when it picks option 2, use π_kj^(k−1).
Transitive closure. Run Floyd-Warshall with w(i, j) = 1 for edges in E; d_ij < ∞ ⟺ (i, j) is in the transitive closure E*. Or reinterpret (min, +) as (OR, AND) on booleans.

APSP via Matrix Multiplication

Redefine matrix multiplication with (+, ·) replaced by (min, +). Then M^(t)_ij = min_k{ W_ik + M^(t−1)_kj } represents "min-weight path from i to j using ≤ t edges".

Because any shortest path has ≤ |V|−1 edges, M^(|V|−1) is the APSP matrix. Using repeated squaring we compute this in O(V³ log V) — worse than Floyd-Warshall but a cleaner conceptual derivation.

Johnson's Algorithm

Problem: All-pairs shortest paths on a graph that may have negative edges but no negative cycles. Runs in O(V E + V² lg V) — asymptotically faster than Floyd-Warshall's O(V³) when E ∈ o(V²), i.e. on sparse graphs.

Key Strategies

Translation

Reweight to make all edges non-negative, then run Dijkstra (which is fast but needs non-negative weights). Reweight to preserve which paths are shortest.

Amortization

Pay one up-front Bellman-Ford so that V subsequent Dijkstra runs can replace V Bellman-Ford runs.

Reweighting Trick

Define ŵ(u, v) = w(u, v) + h(u) − h(v) for some h : V → ℝ. Check each property:

Cycle invariance. Around any cycle the h-terms telescope to zero, so a cycle has the same weight under ŵ and w. (In particular, ŵ has a negative cycle ⟺ w does.)
Path invariance up to endpoints. For any path p = v₀ ⤳ v_k: ŵ(p) = w(p) + h(v₀) − h(v_k). Since the h-correction depends only on endpoints, p minimises ŵ ⟺ p minimises w.

Choosing h

Augment G by adding a new source s with zero-weight edges s → v for every v ∈ V. Run Bellman-Ford from s; set h(v) = δ(s, v). Then for every original edge (u, v): h(v) ≤ h(u) + w(u, v) (triangle inequality), so ŵ(u, v) = w(u, v) + h(u) − h(v) ≥ 0. ✓

JOHNSON(G, w) — 4 steps

Form G' by adding s with s → v (weight 0) for all v. Run BELLMAN-FORD(G', s); if it reports a negative cycle, error out. Otherwise set h(v) = v.d.
Compute ŵ(u, v) = w(u, v) + h(u) − h(v) for each edge.
For each u ∈ V run DIJKSTRA(G, ŵ, u); record D̂_uv.
Recover true distances: D_uv = D̂_uv − h(u) + h(v).

Running Time

Bellman-Ford: O(VE). Reweighting: O(E). V Dijkstras with a Fibonacci heap: V · O(E + V lg V) = O(VE + V² lg V). Un-reweighting: O(V²). Total: O(VE + V² lg V).

Flow Networks

A flow network is a directed weighted graph G = (V, E) with two distinguished vertices — a source s and a sink t — and non-negative edge capacities c : E → ℝ≥0. We extend c(u, v) = 0 if (u, v) ∉ E.

Two structural constraints.

No self-loops: ∀v. (v, v) ∉ E.
No antiparallel edges: ∀u, v. (u, v) ∈ E ⟹ (v, u) ∉ E.

These constraints are required by the algorithms. We can always enforce them by introducing a helper vertex to split one of an antiparallel pair, which does not reduce the set of problems solvable.

Flows and Values

A flow f : V × V → ℝ satisfies:

Capacity constraint: 0 ≤ f(u, v) ≤ c(u, v) for all u, v ∈ V.
Flow conservation: for every u ∈ V \ {s, t}, Σ_v f(v, u) = Σ_v f(u, v).

The value of a flow: |f| = Σ_v f(s, v) − Σ_v f(v, s). (The second term matters when we generalise to multi-source networks, where the residual graph may include edges into s.)

Modelling Tricks

Multi-source / multi-sink: add a supersource S with edges S → s_i (capacity ∞) and a supersink T with edges t_j → T (capacity ∞). This reduces to single-source single-sink without loss of generality.
Antiparallel pair: replace (u, v) with u → v' → v (new vertex v', same capacity on both new edges).

Ford-Fulkerson Method

Three ingredients: residual networks, augmenting paths, and cuts.

Residual Network G_f

Given a flow f on G, the residual network has residual capacities:

c_f(u, v) = c(u, v) − f(u, v) if (u, v) ∈ E ("inc" edge, capacity to add flow)
c_f(u, v) = f(v, u) if (v, u) ∈ E ("dec" edge, capacity to cancel flow)
c_f(u, v) = 0 otherwise

The no-antiparallel constraint on E ensures exactly one case applies. Note G_f itself may contain antiparallel edges.

Augmenting Paths

An augmenting path is any simple s⤳t path p in G_f. Its residual capacity c_f(p) = min over edges on p of c_f(u, v). Augmenting flow f by this p increases |f| by c_f(p) > 0 while preserving validity.

def FORD-FULKERSON(G, s, t):
    for (u, v) in G.E: f(u, v) = 0
    while ∃ augmenting path p in G_f:
        δ = c_f(p)
        for (u, v) in p:
            if (u, v) ∈ G.E: f(u, v) += δ           # inc edge: push more
            else:            f(v, u) −= δ           # dec edge: cancel flow
    return f

Running Time

With integer capacities, each augmenting path increases |f| by ≥ 1, and finding one via BFS/DFS on G_f costs O(V + E) = O(E). Total: O(E · |f*|), where f* is the max flow value.

With irrational capacities, Ford-Fulkerson may not terminate (augmentations can shrink in a non-convergent series).

Edmonds-Karp

Always pick the shortest augmenting path (edge count), by BFS on G_f. Runs in O(V E²) independent of f*; always terminates.

Max-Flow Min-Cut Theorem

A cut (S, T) of a flow network is a partition V = S ∪ T with s ∈ S and t ∈ T. The net flow across the cut is f(S, T) = Σ_{u∈S, v∈T} f(u, v) − Σ_{u∈S, v∈T} f(v, u), and by flow conservation f(S, T) = |f|. The capacity is c(S, T) = Σ_{u∈S, v∈T} c(u, v) — only forward edges S→T are counted.

Weak duality (easy): for any flow f and cut (S, T), |f| = f(S, T) ≤ c(S, T).

Max-Flow Min-Cut Theorem (Ford-Fulkerson) For a flow f in G, these three statements are equivalent:

f is a maximum flow.
The residual network G_f contains no augmenting path.
|f| = c(S, T) for some cut (S, T) of G.

Proof Structure: 1 ⇒ 2, 2 ⇒ 3, 3 ⇒ 1

1 ⇒ 2 (by contrapositive) If G_f has an augmenting path p with capacity c_f(p) > 0, augmenting yields a flow of strictly greater value — contradiction. □

2 ⇒ 3 Suppose G_f has no augmenting path. Let S = {v : s ⤳ v reachable in G_f}, T = V \ S. Since t ∉ S, (S, T) is a cut.

For any u ∈ S, v ∈ T:

If (u, v) ∈ E then f(u, v) = c(u, v) (else the "inc" residual edge would place v ∈ S).
If (v, u) ∈ E then f(v, u) = 0 (else the "dec" residual edge (u, v) would place v ∈ S).

So f(S, T) = Σ c(u, v) − 0 = c(S, T). By flow conservation |f| = f(S, T) = c(S, T). □

3 ⇒ 1 By weak duality, |f| ≤ c(S, T) for every cut. If |f| equals some cut capacity, |f| is the largest flow possible. □

Certificate of optimality. The cut S* of vertices reachable in G_f* proves max-flow optimality — verifiable in O(E) without re-running the algorithm.

Matchings in Bipartite Graphs

Given an undirected graph G = (V, E), a matching M ⊆ E is a set of edges with no shared endpoints. A vertex is matched if some edge of M is incident to it. We want a matching of maximum cardinality.

Maximum ≠ Maximal. A maximal matching cannot be extended by adding an edge; a maximum matching has the largest possible size. Poor greedy choices can give a maximal matching that is not maximum.

A bipartite graph has V = V₁ ∪ V₂ with E ⊆ V₁ × V₂ (every edge crosses between the parts). Classic application: pairing students to rooms in the housing ballot.

Approach 1: Translation to Max-Flow

Add source s, edges s → v (capacity 1) for each v ∈ V₁.
Add sink t, edges v → t (capacity 1) for each v ∈ V₂.
Direct each original edge V₁ → V₂, capacity 1.
Run Ford-Fulkerson. By the Integrality Theorem, integer capacities give an integer max flow, so every edge has flow 0 or 1.
Edges with flow 1 form a maximum matching.

Why This Works (both directions)

Flow ⇒ matching with size = |f*|

Integer flow takes values 0 or 1 on each edge. At each v ∈ V₁ at most 1 unit enters (from s, capacity 1), so at most 1 unit leaves — each left vertex matches at most one right vertex. Similarly for V₂.

Matching ⇒ flow with value = |M|

For each matched edge set flow 1 on that edge plus its s→V₁ and V₂→t edges; 0 elsewhere. Flow conservation follows from the matching property.

Therefore max flow = max matching. Running time O(V E) with Ford-Fulkerson (f* ≤ |V|/2 ≤ V).

Approach 2: Direct Augmenting Paths

An alternating path w.r.t. matching M is a path whose edges alternate "∉M, ∈M, ∉M, …". An augmenting path is an alternating path from an unmatched vertex in V₁ to an unmatched vertex in V₂. Flipping its membership (M ⊕ p) increases |M| by 1.

def MAXIMUM-MATCHING(G):
    M = ∅
    while ∃ augmenting path p for M:
        M = M ⊕ p
    return M

Lemma — M has an augmenting path ⟺ M is not maximum Consider the symmetric difference M ⊕ M* with any maximum M*. Maximum degree in the symmetric difference is 2 (else some vertex would be incident to two edges of M or two of M*). So each connected component is an isolated vertex, a cycle of even length, or a chain. Cycles and even chains use equal M/M* edges; if |M*| > |M|, at least one odd chain has one more M*-edge than M-edge — that chain is an augmenting path for M. □

Finding an augmenting path is a modified BFS/DFS costing O(V + E) = O(E), and the algorithm does ≤ V/2 augmentations, giving O(V E).

Hopcroft-Karp

Finds all vertex-disjoint shortest augmenting paths in one pass and augments along all of them simultaneously. Analysis shows ≤ √V iterations suffice, giving O(E √V). (Proof is non-examinable.)

Minimum Spanning Trees — Safe Edges

A spanning tree of a connected undirected graph G is an acyclic subset T ⊆ E that connects all |V| vertices (hence |T| = |V|−1). A minimum spanning tree (MST) has minimum total weight w(T). MSTs are not necessarily unique.

Vocabulary

cut (S, V\S): Partition of V.
edge crosses cut: One endpoint in S, the other in V\S.
cut respects A: No edge in A crosses the cut.
light edge: An edge crossing the cut with minimum weight among crossing edges (not necessarily unique).
safe edge for A: An edge e such that A ∪ {e} is still contained in some MST.

Both MST algorithms build A ⊆ E greedily, maintaining the invariant "A ⊆ some MST T" and adding safe edges until |A| = |V|−1.

Safe Edge Theorem Let A be contained in some MST, and let (S, V\S) be a cut respecting A. If (u, v) is a light edge crossing the cut, then (u, v) is safe for A.

Proof (exchange argument) Let T be an MST containing A. If (u, v) ∈ T we are done. Otherwise, adding (u, v) to T creates a unique cycle. Some other edge (x, y) on this cycle must also cross the cut. Let T' = T ∪ {(u, v)} \ {(x, y)} — still a spanning tree. Since (u, v) is a light edge, w(u, v) ≤ w(x, y), so w(T') ≤ w(T), and T' is an MST. Because the cut respects A, (x, y) ∉ A, so A ∪ {(u, v)} ⊆ T'. Hence (u, v) is safe. □

Corollary (used by Kruskal). Let C = (V_C, E_C) be a connected component in the forest (V, A). If (u, v) is a light edge connecting C to any other component, then (u, v) is safe for A.

Kruskal's Algorithm

Grow a forest of trees by repeatedly adding the lightest edge that connects two distinct trees (uses the corollary).

def MST-KRUSKAL(G, w):
    A = ∅
    S = new DisjointSet;  for v in G.V: MAKE-SET(S, v)
    edges = SORT(G.E, by weight, ascending)      # O(E lg V)
    for (u, v) in edges:
        if not IN-SAME-SET(S, u, v):
            A = A ∪ {(u, v)}
            UNION(S, u, v)
    return A

Running time: O(E lg V) — dominated by the sort. Disjoint-set operations (with path compression + union by rank) are effectively O(1) amortised, contributing only O((V + E) α(V)).

Application — Hierarchical Clustering

Kruskal's intermediate forests form a dendrogram: a classification tree with join heights reflecting merge weights. Used in image segmentation, bioinformatics, and exploratory data analysis.

Prim's Algorithm

Grow a single tree from an arbitrary root r, always adding the lightest edge crossing the "tree vs. rest" cut (directly applies the Safe Edge Theorem).

def MST-PRIM(G, w, r):
    Q = new PriorityQueue
    for v in G.V:
        v.key = ∞;  v.π = NIL;  Q.insert(v)
    Q.decrease-key(r, 0)
    while not Q.isempty():
        u = Q.extract-min()
        for v in G.adj[u]:
            if v ∈ Q and w(u, v) < v.key:
                v.π = u;  v.key = w(u, v)
                Q.decrease-key(v, v.key)

Differences from Dijkstra: v.key stores the weight of the lightest edge connecting v to the tree (not a cumulative path distance). The MST is {(v, v.π) : v ≠ r}.

Running time with a Fibonacci heap: |V| INSERTs and |V| EXTRACT-MINs cost O(V lg V); |E| DECREASE-KEYs cost O(E) amortised. Membership in Q is tracked with a bit vector in O(1). Total: O(E + V lg V). With a binary heap: O(E lg V).

Topological Sort

Problem: Given a DAG, return a total ordering of vertices such that every edge v₁ → v₂ has v₁ before v₂.

Why DAGs only? A cycle makes a total order impossible — each vertex on the cycle would need to precede itself.

def TOPOLOGICAL-SORT(G):
    for v in G.V: v.colour = 'white'
    totalorder = []
    for v in G.V:
        if v.colour == 'white': visit(v, totalorder)
    return totalorder

def visit(v, totalorder):
    v.colour = 'grey'                    # currently on the DFS stack
    for w in v.neighbours:
        if w.colour == 'white': visit(w, totalorder)
    v.colour = 'black';  totalorder.prepend(v)

Equivalent to: run DFS, then list vertices in descending finish time. Running time Θ(V + E).

Correctness

Claim: for any edge v₁ → v₂, v₁ appears before v₂ Consider the moment v₁ turns grey. Cases on v₂:

v₂ already black ⟹ already in totalorder; v₁ prepended afterwards is earlier. ✓
v₂ white ⟹ visit(v₂) is called inside visit(v₁), finishes and prepends v₂ before v₁ is prepended. ✓
v₂ grey ⟹ visit(v₂) is on the stack, giving a path v₂ ⤳ v₁. Combined with edge v₁ → v₂ this is a cycle — impossible in a DAG. ✓

□

The colours are a conceptual device for the proof and are not actually tracked at runtime (the visited flag suffices).

Amortised Analysis

For algorithms, a worst-case per-operation bound is usually tight enough. For data structures, the worst case for a sequence of operations can be much tighter than (operations) × (worst single-operation cost). Three standard techniques:

Aggregate analysis — bound the total cost of a sequence directly.
Accounting method — assign "charges" that may exceed actual cost, with the excess stored as credit for later.
Potential method — track a whole-data-structure potential Φ and let ΔΦ mediate between true and amortised cost.

Caveat. Amortised analyses apply to sequences starting from an initially empty data structure. A pre-loaded structure can violate the fundamental inequality.

Aggregate Analysis

Bound the total cost c₁ + c₂ + ⋯ + c_m directly; the amortised cost is the average (total / m).

Example — Dynamic Array (doubling)

Start with capacity 16; on overflow, allocate 2× and copy. Starting from an array of 16 items and doing N = 2^k inserts, the total cost of the sequence is:

16 + (16+1) + 15 + (32+1) + 31 + (64+1) + 63 + ⋯ ∈ O(N)

Dividing by N gives amortised cost O(1) per insert. (This is the result we assumed for Stack push in Algorithms 1.)

Example — Binomial Heap Insertions

A single worst-case insertion merges up to ⌊lg N⌋ trees, costing Ω(lg N); but after such a costly insert the structure is "clean" and the next few inserts are cheap. Across N inserts starting from empty, total cost is O(N) (not O(N lg N)).

Accounting Method

Declare an amortised cost for each operation — the amount you "charge the customer". When the amortised cost exceeds the actual cost, the surplus goes into a credit account. When the actual cost exceeds the amortised cost, credit pays the shortfall.

Validity The amortised costs are valid iff for every sequence of operations from the initial state, the credit balance is never negative, i.e. Σ (amortised cost) ≥ Σ (actual cost) at all times.

Example — Dynamic Array

Charge 3 units per insert (amortised O(1)):

1 unit pays for inserting the new item.
1 unit is credit for that item — used later to copy it when the array doubles.
1 unit is credit to pay for copying one of the existing items when doubling next happens.

When capacity doubles from n to 2n, the n items being copied each have at least 1 credit from their own insert plus 1 credit "owed" from a later insert — so the doubling is fully paid for.

Comparison with Potential Method

The accounting method attaches credit to specific items or operations; the potential method tracks a single global Φ for the whole structure. The two are essentially dual — both give valid amortised costs.

Potential Method

Define Φ : States → ℝ with Φ(empty) = 0 and Φ(S) ≥ 0 for all reachable S. For an operation with true cost c taking S_ante → S_post, set:

Amortised cost c' = c + Φ(S_post) − Φ(S_ante) = c + ΔΦ

Why this is valid Telescoping: Σ c'_i = Σ c_i + Φ(S_k) − Φ(S₀) = Σ c_i + Φ(S_k) ≥ Σ c_i. □

Example — Dynamic Array with Φ = 2n − c

(where n = items stored, c = current capacity). Check Φ ≥ 0 everywhere (when array is ≥ half full) and Φ(empty) = 0.

Normal append: c unchanged, n grows. True cost 1. ΔΦ = 2. Amortised = 1 + 2 = O(1).
Doubling append (n items, cap n → 2n): True cost n + 1. ΔΦ = 2(n+1) − 2n − (2n − n) = 2 − n. Amortised = (n+1) + (2 − n) = 3 = O(1). ✓

Example — Binary Counter

Let Φ = (number of 1s in the counter). An INCREMENT operation that resets r_i bits from 1 to 0 and sets one bit from 0 to 1 costs r_i + 1 true work. ΔΦ = 1 − r_i. Amortised = (r_i + 1) + (1 − r_i) = 2 ∈ O(1). So n increments cost O(n) total. ✓

Designing Good Potential Functions

Heuristics

Compare the "damaged" state with the ideal state — Φ should measure the repair cost.
Φ should build up before an expensive cleanup and drop afterwards (ΔΦ < 0 cancels the cost).
Always verify Φ ≥ 0 on all reachable states and Φ(empty) = 0.
Validate tightness with concrete lower-bound examples.

Priority Queue Implementations Compared

Implementation	INSERT	EXTRACT-MIN	DECREASE-KEY	UNION	Key idea
Sorted linked list	O(N)	O(1)	O(N)	O(N₁ + N₂)	Always tidy; INSERT slow
Binary heap	O(lg N)	O(lg N)	O(lg N)	O(N₁ + N₂)	Always tidy; balanced
Binomial heap	O(lg N)	O(lg N)	O(lg N)	O(lg N)	"Binary-counter" of trees; fast UNION
Fibonacci heap	O(1) am.	O(lg N) am.	O(1) am.	O(1) am.	Lazy push/decrease; batch cleanup in EXTRACT-MIN

Used with Dijkstra: binary heap gives O(E lg V); Fibonacci heap gives O(E + V lg V), optimal for sparse graphs.

Asymptotically optimal mergeable priority queues: Brodal heaps (1996) and Strict Fibonacci heaps (2012) achieve O(1) actual worst case for INSERT / DECREASE-KEY / UNION and O(lg N) for EXTRACT-MIN. Non-examinable.

Binomial Heap

A Mergeable Priority Queue ADT: supports CREATE, INSERT, PEEK-MIN, EXTRACT-MIN, DECREASE-KEY, DESTRUCTIVE-UNION, COUNT. (DELETE is DECREASE-KEY to −∞ followed by EXTRACT-MIN.)

Binomial Trees B_k

Defined recursively: B₀ is a single node; B_k is formed by linking two B_k−1 trees, one root becoming the leftmost child of the other. (These are not binary trees.)

Properties of B_k (provable by induction)

Contains exactly 2^k nodes.
Height k.
Exactly ^kC_i nodes at depth i, for i = 0, …, k.
Root has degree k (larger than any other node's).
Children of the root are ordered by descending degree: k−1, k−2, …, 0; child i is itself a B_i.

The maximum degree of any node is lg N (from properties 1 & 4).

A Binomial Heap

Collection of binomial trees with:

Each tree obeys the min-heap property.
At most one tree of any given degree k (so the roots correspond exactly to the 1-bits in the binary representation of N — e.g. N = 11 = 1011₂ ⟹ trees B₀, B₁, B₃).

Therefore an N-node binomial heap contains ≤ ⌊lg N⌋ + 1 trees, and the overall minimum is one of the roots.

Node Structure

Six fields per node: key, payload, parent pointer, child pointer (to ONE child), sibling pointer, degree. Duplicate keys are permitted. The root list is held in increasing-degree order, threaded through the sibling pointers of the roots.

Operations

BH-CREATE — O(1)

Return NIL (an empty heap).

BH-PEEK-MIN — O(lg N)

Scan the root list (≤ ⌊lg N⌋ + 1 trees) to find the minimum root.

BH-MERGE(bt₁, bt₂) — O(1)

Merge two trees of the same degree: make bt₂ the first child of bt₁ (smaller-key root on top). Increments bt₁.degree; preserves child-list ordering.

BH-DESTRUCTIVE-UNION(bh₁, bh₂) — O(lg N)

Merge the two degree-sorted root lists in one pass (like adding two binary numbers). When two trees of the same degree meet, BH-MERGE them; if a carry results, propagate. The final heap has at most one tree of each degree.

BH-INSERT(bh, (k, p)) — O(lg N)

Create a singleton node (degree 0) — a one-element binomial heap — and BH-DESTRUCTIVE-UNION it with bh.

BH-EXTRACT-MIN(bh) — O(lg N)

Use BH-PEEK-MIN (or a stored pointer) to find the minimum root.
Cut this tree out of the root list.
Reverse its child list (so it is sorted by increasing degree).
BH-DESTRUCTIVE-UNION this reversed child list with the remaining root list.

BH-DECREASE-KEY(bh, ptr, nk) — O(lg N)

Same as on a binary min-heap: lower the key, then bubble up by swapping with the parent while the heap property is violated. Height is O(lg N).

Why bother with binomial heaps over binary heaps? A binary heap needs Θ(N₁ + N₂) time to union (copy both arrays and heapify). Binomial heaps do it in O(lg N) — sometimes critical.

Insert: aggregate analysis sketch

A single insert can cost Θ(lg N) in the worst case when it causes a cascade of merges across all degrees — analogous to a binary-counter increment rippling through many 1-bits. But two bad inserts can't happen in a row, and N consecutive inserts from empty total Θ(N) work (same reasoning as binary counter increments). So insert is O(1) amortised.

Fibonacci Heap

Implements the same Mergeable Priority Queue ADT as a binomial heap, but with better amortised bounds: CREATE / PEEK-MIN / COUNT in O(1) actual, INSERT / DECREASE-KEY / UNION in O(1) amortised, and EXTRACT-MIN in O(lg N) amortised.

Structure

A collection of heap-ordered trees held in a doubly linked cyclic root list, unordered. Children of each node are also held in a doubly-linked cyclic list. A pointer minroot tracks the overall minimum in O(1).

Each node has 8 fields: key, payload, left/right sibling pointers, parent pointer, child pointer (to ONE child), degree, marked flag. Nodes in the root list are never marked. The heap itself is a 2-tuple (minroot, N).

Design Rationale — "forward planner"

INSERT is lazy: add a singleton to the root list; O(1) true cost, ΔΦ = +1.
DECREASE-KEY is lazy: if heap order is violated, cut the node out and dump it into the root list. Don't consolidate.
EXTRACT-MIN does batch cleanup: delete minroot, promote its children to the root list, then consolidate roots until no two have the same degree. The potential stored by earlier operations pays for the cleanup.

FH-INSERT(fh, (k, p))

def insert(v, k):
    create singleton node (v, k); marked=false; parent=NIL
    splice into root list
    if k < fh.minroot.key: fh.minroot = new node   # O(1) true cost

FH-EXTRACT-MIN(fh)

def extract-min():
    record (minroot.key, minroot.payload)
    promote each child of minroot to the root list (parent=NIL, marked=false)
    remove minroot
    # Consolidation:
    while two roots have the same degree:
        merge them (keep smaller key on top; increase its degree)
    for each remaining root: update fh.minroot to track min
    return recorded entry

Consolidation uses an auxiliary array A of length D(N) + 1 indexed by degree.

FH-DECREASE-KEY — with Loser Marking

A naive "cut-and-dump" on every violation can produce arbitrarily wide, shallow trees, breaking the O(lg N) bound on max degree. The loser rule preserves shape:

Loser (marked) rules

First child lost: parent becomes a "loser" (marked).
Second child lost: parent is cut out and promoted to the root list; its marking is cleared; its parent is then considered — cascade continues until reaching a non-loser or a root.

def decrease-key(ptr_k, nk):
    if nk > ptr_k.key: error
    ptr_k.key = nk
    if ptr_k.parent == NIL:
        update fh.minroot if necessary;  return
    if ptr_k.key < ptr_k.parent.key:       # heap violated
        while True:
            p = ptr_k.parent
            cut ptr_k out of p's child list; promote ptr_k to root list
            ptr_k.marked = False
            if p == NIL or not p.marked: break
            ptr_k = p                          # cascade
        if p != NIL: p.marked = True            # but not if p is root
    update fh.minroot if necessary

Fibonacci Heap — Amortised Analysis

Potential function Φ = r + 2m, where r = |root list|, m = number of marked nodes.

Φ(empty) = 0 and Φ ≥ 0 always. Let D(N) denote the maximum degree of any node (we will show D(N) ≤ ⌊log_φ N⌋ where φ = (1+√5)/2).

FH-INSERT — amortised O(1)

True cost O(1). ΔΦ = +1 (one new unmarked root). Amortised = O(1). ✓

FH-DESTRUCTIVE-UNION — amortised O(1)

Splice two root lists and compare minroots. True cost O(1). ΔΦ = 0 (credits carry over). Amortised = O(1). ✓

FH-EXTRACT-MIN — amortised O(D(N)) = O(lg N)

Accounting

Let r be the root-list size before the operation, m the marked count. Children promoted ≤ D(N). After consolidation ≤ D(N) + 1 roots remain.

True cost: O(r + D(N)) — promote children, then scan/merge.

ΔΦ: (D(N) + 1) + 2m − (r + 2m) = D(N) + 1 − r.

Amortised: O(r + D(N)) + D(N) + 1 − r = O(D(N)). ✓

FH-DECREASE-KEY — amortised O(1)

Case split on ancestor cascade

Suppose the cascade cuts L loser ancestors before finally marking one non-loser p (or stopping at a root).

True cost: O(L + 1) — L cuts plus possibly marking p.

ΔΦ: + L new roots, − 2L loser marks removed, + 2 for marking p if applicable. Net: L − 2L + 2 = 2 − L.

Amortised: O(L + 1) + (2 − L) = O(1). ✓ (Loser marks were "prepaid" when created.)

Why D(N) = O(lg N) — Fibonacci Shape Lemma

Lemma 1 Let x be any node with k children c₁, c₂, …, c_k in chronological order of becoming children of x. Then c₁.degree ≥ 0 and c_i.degree ≥ i − 2 for i ≥ 2.

Proof. When c_i was merged to x, both had the same degree i − 1 (consolidation merges same-degree trees). Since then, c_i can have lost at most one child (else the loser rule would have removed it). So c_i.degree ≥ (i − 1) − 1 = i − 2. □

Lemma 2 For any node x with degree k, size(x) ≥ fib(k + 2) ≥ φ^k, where fib is the Fibonacci sequence (fib(0) = 0, fib(1) = 1).

Proof. Let s_k = minimum possible subtree size rooted at a degree-k node. s₀ = 1, s₁ = 2. For k ≥ 2, applying Lemma 1 plus monotonicity:

s_k ≥ 1 + 1 + s₀ + s₁ + ⋯ + s_k−2 = 2 + Σ_i=0..k−2 s_i

By induction (using fib(k + 2) = 1 + Σ_i=0..k fib(i)) this gives s_k ≥ fib(k + 2). And fib(k + 2) ≥ φ^k by induction using φ² = φ + 1. □

Conclusion. For any node of degree k in an N-node Fibonacci heap: N ≥ size(x) ≥ φ^k. Taking log_φ: k ≤ ⌊log_φ N⌋ ∈ O(lg N). So D(N) = O(lg N), and EXTRACT-MIN is amortised O(lg N). ✓

Dijkstra with a Fibonacci Heap

Dijkstra does |V| INSERTs + |V| EXTRACT-MINs + ≤ |E| DECREASE-KEYs. With a Fibonacci heap: V · O(1) + V · O(lg V) + E · O(1) = O(E + V lg V) amortised. Optimal for sparse graphs.

Disjoint Sets (Union-Find)

ADT over n distinct keys, initialised with each key in its own set. Operations: MAKE-SET(k), UNION(s₁, s₂), IN-SAME-SET(k₁, k₂). Used by Kruskal to track MST fragments.

Implementations

Implementation	CREATE	UNION	IN-SAME-SET
Doubly linked list (per set)	O(n)	O(n)	O(n)
Cyclic DLL (unordered)	O(n)	O(1)	O(n)
Hash table (key → set-id)	O(n)	O(n)	O(1)
Forest of trees + Path compression + Union by rank	O(n)	≈ O(1) am.	≈ O(1) am.

The Optimal Implementation — Forest with PC + UbR

CREATE

One tree node per key; each a separate singleton tree. Each node stores a parent pointer (initially NIL — meaning "I am the root") and an integer rank (initially 0), an upper bound on subtree depth.

CHASE(k) — find the root, then compress

Walk parent pointers from k's node up to the root r. On the way back, reset every visited node's parent directly to r (path compression). Next CHASE on any of these nodes jumps directly in O(1).

UNION(a, b) — union by rank

Let r_a = CHASE(a), r_b = CHASE(b). If rank(r_a) > rank(r_b), point r_b at r_a; if less, the reverse; if equal, arbitrarily attach one and increment the root's rank. Shallower tree always hangs off deeper — limits depth growth.

IN-SAME-SET(k₁, k₂)

Return CHASE(k₁) == CHASE(k₂).

With both optimisations, m operations on n items run in O(m · α(n)) where α is the inverse Ackermann function. α(n) ≤ 4 for all n ≤ 10⁸⁰, so in practice O(m). Proof of the α bound is beyond this course.

Design-Strategy Analogy

Strategy	DisjointSet	Priority Queue
Anal retentive (tidy always)	DLL per set (UNION slow)	Binary heap (INSERT slow)
Adrenaline junkie (defer, forget)	Hash table (IN-SAME-SET O(1), UNION rescans)	Unsorted linked list (INSERT O(1), EXTRACT slow)
Forward planner (defer, retain)	Forest w/ PC + UbR	Fibonacci heap

The forward planner strategy — defer expensive work until needed, but keep the results so later operations are cheap — unifies the Fibonacci heap and the PC+UbR forest.

Complete Algorithm Summary

Shortest Path Algorithms

Algorithm	Running Time	Edge Weights	Problem
BFS	O(V + E)	Unweighted	Single-source hop-count
DAG-SSSP	Θ(V + E)	Any (needs DAG)	Single-source weighted on a DAG
Dijkstra	O(E + V lg V) Fib-heap	≥ 0	Single-source weighted
Bellman-Ford	O(V E)	Any (detects neg. cycle)	Single-source weighted
Floyd-Warshall	Θ(V³)	Any (no neg. cycles)	All-pairs
Matrix repeated squaring	O(V³ lg V)	Any (no neg. cycles)	All-pairs
Johnson's	O(V E + V² lg V)	Any (no neg. cycles)	All-pairs (sparse)

Graph Substructure Algorithms

Algorithm	Running Time	Problem	Correctness Device
Ford-Fulkerson	O(E · f*)	Max flow (integer cap)	Max-flow min-cut theorem
Edmonds-Karp	O(V E²)	Max flow (any cap)	Same
Bipartite matching via FF	O(V E)	Max matching	Integrality theorem
Direct augmenting paths	O(V E)	Max matching	Symmetric-difference lemma
Hopcroft-Karp	O(E √V)	Max matching	Same
Prim	O(E + V lg V)	MST	Safe Edge Theorem
Kruskal	O(E lg V)	MST	Safe Edge Corollary
Toposort (DFS)	Θ(V + E)	Total order on DAG	Grey-node cycle argument
SCC (2× DFS + transpose)	Θ(V + E)	Strongly connected components	CLRS ch. 22

Data Structure Cost Summary

Structure	INSERT	EXTRACT-MIN / find	DECREASE-KEY	UNION
Binary heap	O(lg N)	O(lg N)	O(lg N)	O(N)
Binomial heap	O(lg N); O(1) am.	O(lg N)	O(lg N)	O(lg N)
Fibonacci heap	O(1) am.	O(lg N) am.	O(1) am.	O(1) am.
DisjointSet (PC + UbR)	O(1)	α(N) am.	—	α(N) am.

Algorithms 2

Graph Definitions & Representation

Special Graph Types

Representations

Breadth-First Search (SSAD_HOPCOUNT)

Correctness — three lemmas

Depth-First Search

Recursive Implementation

Stack-Based Equivalent

Edge Classification & DFS Properties

Useful Consequences

Strongly Connected Components

Bellman-Ford Algorithm

Correctness Sketch

Dijkstra vs Bellman-Ford

Single-Source Shortest Paths on a DAG

Dijkstra's Algorithm

Running Time — dominated by priority queue

Dijkstra Correctness

Floyd-Warshall — All-Pairs Shortest Paths via DP

Optimal Substructure

Recurrence

Extensions

APSP via Matrix Multiplication

Johnson's Algorithm

Key Strategies

Reweighting Trick

Choosing h

Running Time

Flow Networks

Flows and Values

Modelling Tricks

Ford-Fulkerson Method

Residual Network Gf

Augmenting Paths

Running Time

Edmonds-Karp

Max-Flow Min-Cut Theorem

Proof Structure: 1 ⇒ 2, 2 ⇒ 3, 3 ⇒ 1

Matchings in Bipartite Graphs

Approach 1: Translation to Max-Flow

Why This Works (both directions)

Approach 2: Direct Augmenting Paths

Hopcroft-Karp

Minimum Spanning Trees — Safe Edges

Vocabulary

Kruskal's Algorithm

Application — Hierarchical Clustering

Prim's Algorithm

Topological Sort

Correctness

Amortised Analysis

Aggregate Analysis

Example — Dynamic Array (doubling)

Example — Binomial Heap Insertions

Accounting Method

Example — Dynamic Array

Comparison with Potential Method

Potential Method

Example — Dynamic Array with Φ = 2n − c

Example — Binary Counter

Designing Good Potential Functions

Priority Queue Implementations Compared

Binomial Heap

Binomial Trees Bk

A Binomial Heap

Node Structure

Operations

BH-CREATE — O(1)

BH-PEEK-MIN — O(lg N)

BH-MERGE(bt₁, bt₂) — O(1)

BH-DESTRUCTIVE-UNION(bh₁, bh₂) — O(lg N)

BH-INSERT(bh, (k, p)) — O(lg N)

BH-EXTRACT-MIN(bh) — O(lg N)

BH-DECREASE-KEY(bh, ptr, nk) — O(lg N)

Insert: aggregate analysis sketch

Fibonacci Heap

Structure

Design Rationale — "forward planner"

FH-INSERT(fh, (k, p))

Residual Network G_f

Binomial Trees B_k