Algorithms 1

Course Map

This course is about more than particular algorithms. The central idea is that algorithmic performance comes from structure: the structure of the problem, the recurrence, the data representation, and the invariant that makes correctness provable.

Main Themes

Sorting and order statistics.
Divide and conquer, dynamic programming, greedy design.
Core data structures: trees, heaps, hash tables, queues, lists.

What You Need To Be Able To Do

Prove correctness using loop invariants or data-structure invariants.
Solve or estimate recurrence relations.
Choose a data structure that improves the asymptotic cost of an operation set.
Reason about worst-case, expected-case, space use, and practical tradeoffs.

Correctness and Algorithm Analysis

An algorithm is correct if it terminates and returns the right output for every valid input instance. In this course, correctness is as important as speed.

Asymptotic Notation

f(n) ∈ O(g(n))   upper bound
f(n) ∈ Ω(g(n))   lower bound
f(n) ∈ Θ(g(n))   tight bound
f(n) ∈ o(g(n))   strictly slower growth
f(n) ∈ ω(g(n))   strictly faster growth

Common Cost Models

For comparison sorts, count key comparisons.
Ignore fixed-cost arithmetic or control overhead when it only changes constant factors.
Remember practical hidden costs: cache behaviour, pointer chasing, hash computation, equality checks.

Loop Invariants

When an algorithm contains a loop, correctness is often proved by identifying a property preserved by every iteration.

Initialisation: true before the loop starts.
Maintenance: one iteration preserves it.
Termination: when the loop stops, invariant + stop condition imply correctness.

In exam answers, explicitly name the invariant and structure the proof as initialisation, maintenance, termination. The slides do this repeatedly for insertion sort and partition.

Recurrence Relations

Recurrences are how divide-and-conquer costs are usually expressed.

Main Solving Methods

Substitute and spot a pattern.
Recursion tree.
Guess and verify.
Master theorem for T(n)=aT(n/b)+f(n).

Master Theorem Shape

T(n) = aT(n/b) + f(n)

Compare f(n) with n^{log_b a}.

If f(n) is polynomially smaller, recurrence is dominated by the recursive work.
If same order, pick up an extra log n factor.
If polynomially larger and regularity holds, the non-recursive work dominates.

Quadratic Sorting Algorithms

Insertion Sort

Incremental algorithm: maintain a sorted prefix and insert the next key into the right place by shifting larger elements right.

Worst case: Θ(n²).
Average case: Θ(n²).
Stable: yes.
In-place: yes.
Very good for small arrays because the loop body is tight.

Canonical Invariant

At the start of iteration j, the prefix A[1..j-1] is sorted and contains exactly the original elements from those positions.

Selection Sort

Repeatedly select the smallest remaining item and place it into the next position.

Cost: Θ(n²).
Usually not stable unless carefully adapted.
Can reduce number of writes compared with insertion sort.

Why These Still Matter

They are not just “bad algorithms”. They teach invariant-based correctness, and insertion sort appears as a subroutine in stronger algorithms, including median-of-medians and practical hybrids.

Merge Sort and Quicksort

Merge Sort

Classic divide-and-conquer algorithm: split in half, recursively sort both halves, then merge.

T(n) = 2T(n/2) + Θ(n)  ⇒  Θ(n log n)

Stable: yes.
Guaranteed worst case: Θ(n log n).
Memory behaviour matters: textbook merge sort often uses extra space; the slides note bottom-up merge sort can be in-place.

Quicksort

Partition around a pivot, then recursively sort the two sides.

Best case: balanced split every time, Θ(n log n).
Any constant-ratio split still gives Θ(n log n).
Worst case: constant-and-rest split every time, Θ(n²).
Expected average case: Θ(n log n) with sensible/random pivots.
In-place: yes in the standard array partition variants.
Stable: no in the usual in-place form.

Partition Correctness

The slides prove PARTITION using the invariant that elements left of boundary i are ≤ pivot, elements between i+1 and j-1 are > pivot, and the pivot is preserved at the end until the final swap.

Pivot Strategy Tradeoffs

Naive last/first element: vulnerable to sorted or adversarial input.
Random input shuffling: makes orderings roughly equally likely.
Random pivot: reduces dependence on input order.
Median-of-three: lowers probability of extreme pivots.
Three-way partition: important when many duplicates occur.
Median-of-medians: deterministic good pivot, stronger worst-case guarantees.

Heapsort and Heaps

A binary heap is a semi-structured array representation of a nearly-complete binary tree.

Heap Properties

Structural: complete except maybe bottom level, filled left-to-right.
Ordering: in a max-heap, parent key is at least each child key; symmetrically for min-heaps.

Array Mapping

LEFT(i)   = 2i
RIGHT(i)  = 2i + 1
PARENT(i) = floor(i/2)

Main Operations

REHEAPIFY: O(log n).
FULL-HEAPIFY: Θ(n), not Θ(n log n).
EXTRACT-MAX or EXTRACT-MIN: O(log n).
PEEK: O(1).
INSERT, key increase/decrease: O(log n).

Heapsort

Build a max-heap from the array in Θ(n).
Repeatedly swap root with final item in heap region.
Reduce heap size and reheapify root.

Total cost: O(n log n). It is in-place but not stable.

The subtle point students often miss: building the heap is linear because lower levels are cheap to fix and dominate the count of nodes.

Sorting in Linear Time

Comparison sorting has an Ω(n log n) lower bound. To beat this, you must use more structure about the input.

Counting Sort

For keys in a bounded integer range [0..k].

Time: Θ(n + k).
Stable: yes, if the output fill runs right-to-left as in CLRS/slides.
Not comparison-based.
Usually not in-place because of output array B and count array C.

Radix Sort

Sort by digits from least significant to most significant, using a stable sort at each pass.

Θ(d(n + k))

Where d is number of digits and k is digit range.

Bucket Sort

For values uniformly distributed over a known interval such as [0,1).

Average case: Θ(n).
Worst case: Θ(n²) if buckets become highly uneven.

Stability

A stable sort preserves the relative order of items with equal keys. This matters directly for radix sort and indirectly in many practical pipelines.

Median and Order Statistics

The ith order statistic is the ith smallest element. The selection problem asks for it without fully sorting unless necessary.

Quickselect

Same partition idea as quicksort, but recurse only into the side that contains the target rank.

Worst case with bad pivots: Θ(n²).
Much faster in practice than sorting if only one rank is needed.

Median-of-Medians

Deterministic pivot selection guaranteeing a useful ratio split.

Group elements in fives.
Sort each group and take each group median.
Recursively select the median of those medians.
Use that as pivot.

This guarantees at least about 3n/10 - 6 elements on one side and therefore leads to linear-time selection:

T(n) = T(ceil(n/5)) + T(7n/10 + 6) + Θ(n)  ⇒  Θ(n)

Quicksort with Median-of-Medians

If distinct keys allow the true median to be used each time, quicksort gets a guaranteed balanced split and so Θ(n log n) worst-case time. The tradeoff is larger constants.

Algorithm Design Paradigms

Paradigm	Core Idea	When It Works Best
Incremental	Grow a partial solution one step at a time.	Local maintenance is cheap.
Divide and conquer	Split into independent subproblems.	Subproblems do not overlap much.
Dynamic programming	Reuse overlapping subproblem answers.	Optimal substructure + overlap.
Greedy	Commit to best-looking local move.	Greedy-choice property + optimal substructure.

Dynamic Programming

DP is for optimisation problems with optimal substructure and overlapping subproblems.

Canonical Workflow

Characterise the structure of an optimal solution.
Write a recurrence for the value of an optimal solution.
Compute the values efficiently by memoisation or bottom-up filling.
If needed, reconstruct the actual solution.

Top-Down vs Bottom-Up

Approach	Advantages	Disadvantages
Top-down memoisation	Only solves needed subproblems; recurrence is natural.	Uses recursion stack.
Bottom-up tabulation	No recursion stack; often simpler iteration order once known.	May solve unnecessary subproblems.

Example Style

The slides use a rod-cutting / VM-hosting optimisation example. The naive recursive version is exponential because it recomputes the same suffix problems repeatedly; memoisation collapses this to polynomial time.

If a recursive optimisation solution branches into the same smaller parameter values again and again, that is your strongest signal that DP is the right paradigm.

Greedy Algorithms

A greedy algorithm picks one locally optimal move and commits to it immediately. This is only valid when local optimality is globally safe.

What Must Be True

There is a meaningful greedy first move.
There always exists an optimal solution that starts with that greedy move.
After making that move, the remaining subproblem is itself optimally solvable.

Good vs Bad Examples

Fractional bag-filling by value density: greedy works.
Rod cutting / VM-hosting: greedy fails because the locally best first cut can destroy global optimality.
Activity selection: choosing earliest finishing activity works because it leaves maximal room for later compatible choices.

Activity Selection

After sorting activities by finishing time, repeatedly choose the earliest-finishing compatible activity. The goal is maximum cardinality, not maximum occupied time.

Elementary Data Structures

Pointers

Pointers are runtime references to objects by base address. They enable dynamic structures such as linked lists and trees. NIL denotes absence of a target.

Stacks

LIFO structure supporting PUSH and POP. Arrays give bounded-capacity stacks easily; linked lists give unbounded stacks cleanly.

Queues

FIFO structure. Typically implemented with circular arrays or linked lists.

Linked Lists

Singly linked: cheaper, but backward deletion/search context is awkward.
Doubly linked: easier deletions and bidirectional traversal at pointer-cost overhead.
Cyclic variants simplify some end cases by avoiding null at the ends.

Rooted Trees

Nodes may have parent pointers, fixed or variable arity, and child lists. Choice of representation changes operation complexity.

Binary Search Trees

A BST stores key/payload pairs so that every key in the left subtree is strictly smaller and every key in the right subtree is strictly larger.

Supported Operations

SEARCH, INSERT, DELETE.
MINIMUM, MAXIMUM.
PREDECESSOR, SUCCESSOR.

Performance

Expected or typical random-shape cost: O(log n).
Worst case: O(n) if the tree degenerates into a chain.

Key Structural Points

No duplicate keys if you want the classical clean guarantees.
PREDECESSOR and SUCCESSOR need parent pointers in the textbook simple implementation.
Deletion is the subtle operation: zero-child, one-child, and two-child cases differ.

A BST’s average speed is not enough if an adversary or unlucky insertion order can force chain-like worst-case behaviour. That is the motivation for balanced trees.

Balanced Trees: B-Trees and Red-Black Trees

B-Trees

Designed especially for disk-based settings, where following a pointer can cost millions of CPU cycles. The goal is to minimise tree height.

B-Tree of Minimum Degree `T`

Non-root internal nodes store between T-1 and 2T-1 keys.
A node with t keys has t+1 children.
All leaves are at the same depth.
Separator keys partition child key ranges.

Height is O(log_T N), hence O(log N) search/insert/delete.

B-Tree Insert/Delete

Insert via splitting full nodes, ideally pre-emptively on the way down.
Delete via redistribution from siblings or merging with siblings, while preserving min/max occupancy.
The root is special: splits increase height; losing its last key can reduce height.

Red-Black Trees

RB trees are BSTs with colour constraints enforcing approximate balance.

Properties

Each node is red or black.
Root is black.
Leaves are black sentinel leaves.
Red nodes have black children.
Every path from a node to descendant leaves contains the same number of black nodes.

These imply height at most 2 log(n+1), giving O(log n) worst-case operations.

2-3-4 Tree Isomorphism

The slides explicitly connect RB trees to B-Tree<2> (2-3-4 trees). This is conceptually important: red nodes encode fused multi-key nodes.

Rotations

Local restructuring preserving global key order. Used inside balancing logic for RB insert/delete.

Priority Queues

A priority queue maintains a dynamic set and supports repeated access to the extreme-priority item.

Operations

MINIMUM / MAXIMUM
EXTRACT-MIN / EXTRACT-MAX
INSERT
DECREASE-KEY / INCREASE-KEY

Heap Implementation

MINIMUM: O(1)
all other core operations: O(log n)

Red-Black Tree Implementation

MINIMUM: O(log n)
other operations: O(log n)

The heap wins on peek-at-extremum; RB trees may be preferable when richer dictionary-like operations are also needed.

Hash Tables

Hash tables implement dictionary operations INSERT, SEARCH, DELETE with expected near-constant-time performance, provided the hash function and load-factor management behave well.

Hashing Issues

Range reduction: hash values usually need mod table size.
Performance: real hash functions can be expensive.
Collisions: unavoidable by pigeonhole principle.
Entropy/selectivity: similar or low-entropy keys can cluster badly.

Chaining

Table entries point to linked lists. If load factor is α = n / T.size, many operations have expected time O(1 + α).

Variants from the Slides

Append / replace existing binding.
Keep lists sorted by key.
Push on head with tombstone-style logical deletion.
Push on head and delete from list, giving stack-like behaviour per key.

Open Addressing

Store items directly in table slots and resolve collisions with a probe sequence.

Probe Schemes

Linear probing: Simple but suffers from primary clustering.
Quadratic probing: Avoids primary clustering, still suffers secondary clustering.
Double hashing: Best of the three here: separates keys even when primary hash collides.

Deletion in Open Addressing

Cannot simply clear a slot, because that would break future search probe sequences. Need a marker/tombstone.

Resizing

Manage both live keys and tombstones. Rehash into larger, same-sized, or smaller table depending on occupancy and marker count.

“Expected O(1)” does not mean “always fast”. A poor hash function, costly equality tests, or bad load-factor control can dominate actual performance.

Core Comparison Table

Method	Worst Case	Average / Expected	Stable	In-Place
Insertion sort	`Θ(n²)`	`Θ(n²)`	Yes	Yes
Merge sort	`Θ(n log n)`	`Θ(n log n)`	Yes	Usually no in textbook form
Heapsort	`O(n log n)`	`O(n log n)`	No	Yes
Quicksort	`Θ(n²)`	`Θ(n log n)` expected	No	Yes
Quicksort with MoM pivots	`Θ(n log n)`	`Θ(n log n)`	No	Yes
Counting sort	`Θ(n+k)`	`Θ(n+k)`	Yes	No
Radix sort	`Θ(d(n+k))`	`Θ(d(n+k))`	Depends on stable inner sort	No
Bucket sort	`Θ(n²)`	`Θ(n)` average	Depends on bucket handling	No

Exam Use

What Strong Answers Usually Include

The right asymptotic bound, with best/worst/expected case clearly distinguished.
The invariant or structural property that makes the algorithm correct.
The recurrence and the method used to solve it.
The data-structure property that enables the runtime, not just the operation list.
The tradeoff: memory, stability, in-place behaviour, worst-case robustness, or constant factors.

Common Mistakes

Claiming BUILD-HEAP is Θ(n log n) instead of Θ(n).
Forgetting that any constant-ratio quicksort split is still Θ(n log n).
Using DP when subproblems do not overlap, or greedy without proving the greedy-choice property.
Treating BST average-case behaviour as if it were a worst-case guarantee.
Forgetting tombstones/markers in open-addressing deletion.
Calling a sort stable without checking equal-key behaviour.

The course is foundational: the best exam answers do not merely state the algorithm, they explain why its structure forces the runtime and why its invariant forces correctness.

Course Map

Correctness and Algorithm Analysis

Asymptotic Notation

Common Cost Models

Loop Invariants

Recurrence Relations

Main Solving Methods

Master Theorem Shape

Quadratic Sorting Algorithms

Insertion Sort

Canonical Invariant

Selection Sort

Merge Sort and Quicksort

Merge Sort

Quicksort

Partition Correctness

Pivot Strategy Tradeoffs

Heapsort and Heaps

Heap Properties

Array Mapping

Main Operations

Heapsort

Sorting in Linear Time

Counting Sort

Radix Sort

Bucket Sort

Stability

Median and Order Statistics

Quickselect

Median-of-Medians

Quicksort with Median-of-Medians

Algorithm Design Paradigms

Dynamic Programming

Canonical Workflow

Top-Down vs Bottom-Up

Example Style

Greedy Algorithms

What Must Be True

Good vs Bad Examples

Activity Selection

Elementary Data Structures

Pointers

Stacks

Queues

Linked Lists

Rooted Trees

Binary Search Trees

Supported Operations

Performance

Key Structural Points

Balanced Trees: B-Trees and Red-Black Trees

B-Trees

B-Tree of Minimum Degree T

B-Tree Insert/Delete

Red-Black Trees

Properties

2-3-4 Tree Isomorphism

Rotations

Priority Queues

Operations

Heap Implementation

Red-Black Tree Implementation

Hash Tables

Hashing Issues

Chaining

Variants from the Slides

Open Addressing

Probe Schemes

Deletion in Open Addressing

Resizing

Core Comparison Table

Exam Use

What Strong Answers Usually Include

Common Mistakes

B-Tree of Minimum Degree `T`