Algorithms 1
Course Map
This course is about more than particular algorithms. The central idea is that algorithmic performance comes from structure: the structure of the problem, the recurrence, the data representation, and the invariant that makes correctness provable.
- Sorting and order statistics.
- Divide and conquer, dynamic programming, greedy design.
- Core data structures: trees, heaps, hash tables, queues, lists.
- Prove correctness using loop invariants or data-structure invariants.
- Solve or estimate recurrence relations.
- Choose a data structure that improves the asymptotic cost of an operation set.
- Reason about worst-case, expected-case, space use, and practical tradeoffs.
Correctness and Algorithm Analysis
An algorithm is correct if it terminates and returns the right output for every valid input instance. In this course, correctness is as important as speed.
Asymptotic Notation
f(n) ∈ O(g(n)) upper bound
f(n) ∈ Ω(g(n)) lower bound
f(n) ∈ Θ(g(n)) tight bound
f(n) ∈ o(g(n)) strictly slower growth
f(n) ∈ ω(g(n)) strictly faster growth
Common Cost Models
- For comparison sorts, count key comparisons.
- Ignore fixed-cost arithmetic or control overhead when it only changes constant factors.
- Remember practical hidden costs: cache behaviour, pointer chasing, hash computation, equality checks.
Loop Invariants
When an algorithm contains a loop, correctness is often proved by identifying a property preserved by every iteration.
- Initialisation: true before the loop starts.
- Maintenance: one iteration preserves it.
- Termination: when the loop stops, invariant + stop condition imply correctness.
Recurrence Relations
Recurrences are how divide-and-conquer costs are usually expressed.
Main Solving Methods
- Substitute and spot a pattern.
- Recursion tree.
- Guess and verify.
- Master theorem for
T(n)=aT(n/b)+f(n).
Master Theorem Shape
T(n) = aT(n/b) + f(n)
Compare f(n) with n^{log_b a}.
- If
f(n)is polynomially smaller, recurrence is dominated by the recursive work. - If same order, pick up an extra
log nfactor. - If polynomially larger and regularity holds, the non-recursive work dominates.
Quadratic Sorting Algorithms
Insertion Sort
Incremental algorithm: maintain a sorted prefix and insert the next key into the right place by shifting larger elements right.
- Worst case:
Θ(n²). - Average case:
Θ(n²). - Stable: yes.
- In-place: yes.
- Very good for small arrays because the loop body is tight.
Canonical Invariant
At the start of iteration j, the prefix A[1..j-1] is sorted and contains exactly the original elements from those positions.
Selection Sort
Repeatedly select the smallest remaining item and place it into the next position.
- Cost:
Θ(n²). - Usually not stable unless carefully adapted.
- Can reduce number of writes compared with insertion sort.
They are not just “bad algorithms”. They teach invariant-based correctness, and insertion sort appears as a subroutine in stronger algorithms, including median-of-medians and practical hybrids.
Merge Sort and Quicksort
Merge Sort
Classic divide-and-conquer algorithm: split in half, recursively sort both halves, then merge.
T(n) = 2T(n/2) + Θ(n) ⇒ Θ(n log n)
- Stable: yes.
- Guaranteed worst case:
Θ(n log n). - Memory behaviour matters: textbook merge sort often uses extra space; the slides note bottom-up merge sort can be in-place.
Quicksort
Partition around a pivot, then recursively sort the two sides.
- Best case: balanced split every time,
Θ(n log n). - Any constant-ratio split still gives
Θ(n log n). - Worst case: constant-and-rest split every time,
Θ(n²). - Expected average case:
Θ(n log n)with sensible/random pivots. - In-place: yes in the standard array partition variants.
- Stable: no in the usual in-place form.
Partition Correctness
The slides prove PARTITION using the invariant that elements left of boundary i are ≤ pivot, elements between i+1 and j-1 are > pivot, and the pivot is preserved at the end until the final swap.
Pivot Strategy Tradeoffs
- Naive last/first element: vulnerable to sorted or adversarial input.
- Random input shuffling: makes orderings roughly equally likely.
- Random pivot: reduces dependence on input order.
- Median-of-three: lowers probability of extreme pivots.
- Three-way partition: important when many duplicates occur.
- Median-of-medians: deterministic good pivot, stronger worst-case guarantees.
Heapsort and Heaps
A binary heap is a semi-structured array representation of a nearly-complete binary tree.
Heap Properties
- Structural: complete except maybe bottom level, filled left-to-right.
- Ordering: in a max-heap, parent key is at least each child key; symmetrically for min-heaps.
Array Mapping
LEFT(i) = 2i
RIGHT(i) = 2i + 1
PARENT(i) = floor(i/2)
Main Operations
REHEAPIFY:O(log n).FULL-HEAPIFY:Θ(n), notΘ(n log n).EXTRACT-MAXorEXTRACT-MIN:O(log n).PEEK:O(1).INSERT, key increase/decrease:O(log n).
Heapsort
- Build a max-heap from the array in
Θ(n). - Repeatedly swap root with final item in heap region.
- Reduce heap size and reheapify root.
Total cost: O(n log n). It is in-place but not stable.
Sorting in Linear Time
Comparison sorting has an Ω(n log n) lower bound. To beat this, you must use more structure about the input.
Counting Sort
For keys in a bounded integer range [0..k].
- Time:
Θ(n + k). - Stable: yes, if the output fill runs right-to-left as in CLRS/slides.
- Not comparison-based.
- Usually not in-place because of output array
Band count arrayC.
Radix Sort
Sort by digits from least significant to most significant, using a stable sort at each pass.
Θ(d(n + k))
Where d is number of digits and k is digit range.
Bucket Sort
For values uniformly distributed over a known interval such as [0,1).
- Average case:
Θ(n). - Worst case:
Θ(n²)if buckets become highly uneven.
Stability
A stable sort preserves the relative order of items with equal keys. This matters directly for radix sort and indirectly in many practical pipelines.
Median and Order Statistics
The ith order statistic is the ith smallest element. The selection problem asks for it without fully sorting unless necessary.
Quickselect
Same partition idea as quicksort, but recurse only into the side that contains the target rank.
- Worst case with bad pivots:
Θ(n²). - Much faster in practice than sorting if only one rank is needed.
Median-of-Medians
Deterministic pivot selection guaranteeing a useful ratio split.
- Group elements in fives.
- Sort each group and take each group median.
- Recursively select the median of those medians.
- Use that as pivot.
This guarantees at least about 3n/10 - 6 elements on one side and therefore leads to linear-time selection:
T(n) = T(ceil(n/5)) + T(7n/10 + 6) + Θ(n) ⇒ Θ(n)
Quicksort with Median-of-Medians
If distinct keys allow the true median to be used each time, quicksort gets a guaranteed balanced split and so Θ(n log n) worst-case time. The tradeoff is larger constants.
Algorithm Design Paradigms
| Paradigm | Core Idea | When It Works Best |
|---|---|---|
| Incremental | Grow a partial solution one step at a time. | Local maintenance is cheap. |
| Divide and conquer | Split into independent subproblems. | Subproblems do not overlap much. |
| Dynamic programming | Reuse overlapping subproblem answers. | Optimal substructure + overlap. |
| Greedy | Commit to best-looking local move. | Greedy-choice property + optimal substructure. |
Dynamic Programming
DP is for optimisation problems with optimal substructure and overlapping subproblems.
Canonical Workflow
- Characterise the structure of an optimal solution.
- Write a recurrence for the value of an optimal solution.
- Compute the values efficiently by memoisation or bottom-up filling.
- If needed, reconstruct the actual solution.
Top-Down vs Bottom-Up
| Approach | Advantages | Disadvantages |
|---|---|---|
| Top-down memoisation | Only solves needed subproblems; recurrence is natural. | Uses recursion stack. |
| Bottom-up tabulation | No recursion stack; often simpler iteration order once known. | May solve unnecessary subproblems. |
Example Style
The slides use a rod-cutting / VM-hosting optimisation example. The naive recursive version is exponential because it recomputes the same suffix problems repeatedly; memoisation collapses this to polynomial time.
Greedy Algorithms
A greedy algorithm picks one locally optimal move and commits to it immediately. This is only valid when local optimality is globally safe.
What Must Be True
- There is a meaningful greedy first move.
- There always exists an optimal solution that starts with that greedy move.
- After making that move, the remaining subproblem is itself optimally solvable.
Good vs Bad Examples
- Fractional bag-filling by value density: greedy works.
- Rod cutting / VM-hosting: greedy fails because the locally best first cut can destroy global optimality.
- Activity selection: choosing earliest finishing activity works because it leaves maximal room for later compatible choices.
Activity Selection
After sorting activities by finishing time, repeatedly choose the earliest-finishing compatible activity. The goal is maximum cardinality, not maximum occupied time.
Elementary Data Structures
Pointers
Pointers are runtime references to objects by base address. They enable dynamic structures such as linked lists and trees. NIL denotes absence of a target.
Stacks
LIFO structure supporting PUSH and POP. Arrays give bounded-capacity stacks easily; linked lists give unbounded stacks cleanly.
Queues
FIFO structure. Typically implemented with circular arrays or linked lists.
Linked Lists
- Singly linked: cheaper, but backward deletion/search context is awkward.
- Doubly linked: easier deletions and bidirectional traversal at pointer-cost overhead.
- Cyclic variants simplify some end cases by avoiding null at the ends.
Rooted Trees
Nodes may have parent pointers, fixed or variable arity, and child lists. Choice of representation changes operation complexity.
Binary Search Trees
A BST stores key/payload pairs so that every key in the left subtree is strictly smaller and every key in the right subtree is strictly larger.
Supported Operations
SEARCH,INSERT,DELETE.MINIMUM,MAXIMUM.PREDECESSOR,SUCCESSOR.
Performance
- Expected or typical random-shape cost:
O(log n). - Worst case:
O(n)if the tree degenerates into a chain.
Key Structural Points
- No duplicate keys if you want the classical clean guarantees.
PREDECESSORandSUCCESSORneed parent pointers in the textbook simple implementation.- Deletion is the subtle operation: zero-child, one-child, and two-child cases differ.
Balanced Trees: B-Trees and Red-Black Trees
B-Trees
Designed especially for disk-based settings, where following a pointer can cost millions of CPU cycles. The goal is to minimise tree height.
B-Tree of Minimum Degree T
- Non-root internal nodes store between
T-1and2T-1keys. - A node with
tkeys hast+1children. - All leaves are at the same depth.
- Separator keys partition child key ranges.
Height is O(log_T N), hence O(log N) search/insert/delete.
B-Tree Insert/Delete
- Insert via splitting full nodes, ideally pre-emptively on the way down.
- Delete via redistribution from siblings or merging with siblings, while preserving min/max occupancy.
- The root is special: splits increase height; losing its last key can reduce height.
Red-Black Trees
RB trees are BSTs with colour constraints enforcing approximate balance.
Properties
- Each node is red or black.
- Root is black.
- Leaves are black sentinel leaves.
- Red nodes have black children.
- Every path from a node to descendant leaves contains the same number of black nodes.
These imply height at most 2 log(n+1), giving O(log n) worst-case operations.
2-3-4 Tree Isomorphism
The slides explicitly connect RB trees to B-Tree<2> (2-3-4 trees). This is conceptually important: red nodes encode fused multi-key nodes.
Rotations
Local restructuring preserving global key order. Used inside balancing logic for RB insert/delete.
Priority Queues
A priority queue maintains a dynamic set and supports repeated access to the extreme-priority item.
Operations
MINIMUM/MAXIMUMEXTRACT-MIN/EXTRACT-MAXINSERTDECREASE-KEY/INCREASE-KEY
Heap Implementation
MINIMUM:O(1)- all other core operations:
O(log n)
Red-Black Tree Implementation
MINIMUM:O(log n)- other operations:
O(log n)
The heap wins on peek-at-extremum; RB trees may be preferable when richer dictionary-like operations are also needed.
Hash Tables
Hash tables implement dictionary operations INSERT, SEARCH, DELETE with expected near-constant-time performance, provided the hash function and load-factor management behave well.
Hashing Issues
- Range reduction: hash values usually need mod table size.
- Performance: real hash functions can be expensive.
- Collisions: unavoidable by pigeonhole principle.
- Entropy/selectivity: similar or low-entropy keys can cluster badly.
Chaining
Table entries point to linked lists. If load factor is α = n / T.size, many operations have expected time O(1 + α).
Variants from the Slides
- Append / replace existing binding.
- Keep lists sorted by key.
- Push on head with tombstone-style logical deletion.
- Push on head and delete from list, giving stack-like behaviour per key.
Open Addressing
Store items directly in table slots and resolve collisions with a probe sequence.
Probe Schemes
- Linear probing
- Simple but suffers from primary clustering.
- Quadratic probing
- Avoids primary clustering, still suffers secondary clustering.
- Double hashing
- Best of the three here: separates keys even when primary hash collides.
Deletion in Open Addressing
Cannot simply clear a slot, because that would break future search probe sequences. Need a marker/tombstone.
Resizing
Manage both live keys and tombstones. Rehash into larger, same-sized, or smaller table depending on occupancy and marker count.
Core Comparison Table
| Method | Worst Case | Average / Expected | Stable | In-Place |
|---|---|---|---|---|
| Insertion sort | Θ(n²) | Θ(n²) | Yes | Yes |
| Merge sort | Θ(n log n) | Θ(n log n) | Yes | Usually no in textbook form |
| Heapsort | O(n log n) | O(n log n) | No | Yes |
| Quicksort | Θ(n²) | Θ(n log n) expected | No | Yes |
| Quicksort with MoM pivots | Θ(n log n) | Θ(n log n) | No | Yes |
| Counting sort | Θ(n+k) | Θ(n+k) | Yes | No |
| Radix sort | Θ(d(n+k)) | Θ(d(n+k)) | Depends on stable inner sort | No |
| Bucket sort | Θ(n²) | Θ(n) average | Depends on bucket handling | No |
Exam Use
What Strong Answers Usually Include
- The right asymptotic bound, with best/worst/expected case clearly distinguished.
- The invariant or structural property that makes the algorithm correct.
- The recurrence and the method used to solve it.
- The data-structure property that enables the runtime, not just the operation list.
- The tradeoff: memory, stability, in-place behaviour, worst-case robustness, or constant factors.
Common Mistakes
- Claiming
BUILD-HEAPisΘ(n log n)instead ofΘ(n). - Forgetting that any constant-ratio quicksort split is still
Θ(n log n). - Using DP when subproblems do not overlap, or greedy without proving the greedy-choice property.
- Treating BST average-case behaviour as if it were a worst-case guarantee.
- Forgetting tombstones/markers in open-addressing deletion.
- Calling a sort stable without checking equal-key behaviour.