Minmax Path Problem · Computer Science Notes

The minimax path problem is a classic in graph theory. The minimax path problem (also called the bottleneck shortest path problem) is a wonderful sandbox because it admits solutions ranging from brute force to deeply elegant.

You’re given a weighted, undirected graph (G = (V, E)). For any path between two nodes (s) and (t), define the path cost as the maximum edge weight along that path. The minimax path problem asks:

Find a path from (s) to (t) such that this maximum edge weight is minimized.

So instead of minimizing the sum of edge weights (like Dijkstra), we minimize the largest edge weight encountered. This has a number of interesting solutions that are all useful under different scenarios:

Naive SSSP Approaches

These try to adapt shortest-path algorithms. Here is one possibility:

Modified Dijkstra (max-based relaxation)

Instead of dist[v] = dist[u] + w(u,v), use:

\[dist[v] = \min(dist[v], \max(dist[u], w(u,v)))\]

This works but is inefficient if you need multiple queries, since you’d have to recompute for each source.

Naive Spanning Tree Approaches

These exploit the fact that minimax paths are closely tied to Minimum Spanning Trees (MSTs). This can be expressed in the following property:

The minimax path between any two nodes u and v in a graph lies on the Minimum Spanning Tree (MST).

Why? Kruskal's algorithm greedily adds the smallest edge that connects two components. If a non-MST edge were needed for a minimax path, it would have a weight ≥ the largest MST edge on the same path. So the MST path is always at least as good.

Key insight: Since the minimax path lies on the MST, we only need to store the MST (V−1 edges = O(V) space). For each query we simply traverse the unique tree path.

preprocess(graph):
    run Kruskal's algorithm to build the MST
    store MST as a tree adjacency list: mst_adj[V]
    // Space: O(V)

queryMinimax(s, t):
    // DFS on the MST from s, tracking max edge weight seen so far
    // Returns the max edge on the unique s→t tree path

    stack = [(s, parent=-1, max_so_far=0)]
    while stack not empty:
        (node, par, cur_max) = stack.pop()
        if node == t:
            return cur_max
        for (neighbour, edge_weight) in mst_adj[node]:
            if neighbour != par:
                stack.push((neighbour, node,
                             max(cur_max, edge_weight)))
    // t unreachable (disconnected graph) → return ∞

Complexity:

Preprocess: O(E log E) Kruskal; O(V) MST space.
Query: O(V) — the path on a tree can span the whole tree.

This is extremely space-efficient but each query is linear. Good when queries are rare.

Full Answer Table

Key insight: When Kruskal merges component A with component B via an edge of weight w, every pair (a ∈ A, b ∈ B) has exactly that minimax value w — because this is the first moment they became connected in Kruskal's order. We can fill an answer matrix during the merge.

preprocess(graph):
    sort edges ascending → sorted_edges[0 .. E-1]
    initialise UFDS with V nodes
    initialise ans[V][V] = ∞
    for v in 0..V-1: ans[v][v] = 0

    // Also maintain explicit member lists per component
    members[v] = {v} for each v

    for (u, v, w) in sorted_edges:
        ru = find(u),  rv = find(v)
        if ru == rv: continue

        // Every pair across the two merging components gets answer w
        for each a in members[ru]:
            for each b in members[rv]:
                ans[a][b] = w
                ans[b][a] = w

        // Merge smaller component into larger (union by size)
        if |members[ru]| < |members[rv]|: swap(ru, rv)
        members[ru] = members[ru] ∪ members[rv]
        union(ru, rv)

queryMinimax(s, t):
    return ans[s][t]

Why is the total work O(V²)?

Each unordered pair (a, b) is written into the answer table exactly once — at the precise step in Kruskal's process when a's component and b's component first merge. Since there are exactly V(V−1)/2 unordered pairs, the total number of writes is Θ(V²).

Complexity:

Preprocess: O(V² + E log E) time and O(V²) space.
Query: O(1).

Dynamic Programming Algorithm (Modified Bellman-Ford)

This is a single-source DP that generalises Bellman-Ford to the minimax objective. It is structurally different from Floyd-Warshall (which is all-pairs) and different from the SSSP Dijkstra approach (which uses a priority queue).

The DP Formulation

State: dp[v] = the current best-known minimax cost from source s to node v.

More formally, after pass p, dp[v] = the minimum possible maximum edge weight on any path from s to v using at most p edges.

Base case:

dp[s] = 0
dp[v] = ∞  for all v ≠ s

Transition (applied in each pass over all edges):

For each edge (u, v, w):
    dp[v] = min(dp[v], max(dp[u], w))   // relax v via u
    dp[u] = min(dp[u], max(dp[v], w))   // relax u via v (undirected)

Termination: After at most V−1 passes (since any simple path uses at most V−1 edges). Early termination when a full pass produces no update.

Pseudocode

preprocess(graph):
    sort all edges by weight ascending → sorted_edges[0..E-1]
    // Sorting is optional but enables early termination in practice

queryMinimax(s, t):
    dp[v] = ∞ for all v
    dp[s] = 0

    for pass = 1 to V-1:
        updated = false
        for each edge (u, v, w) in sorted_edges:
            // Attempt to improve the minimax path to v via u
            candidate_v = max(dp[u], w)
            if candidate_v < dp[v]:
                dp[v] = candidate_v
                updated = true
            // Same in reverse (undirected graph)
            candidate_u = max(dp[v], w)
            if candidate_u < dp[u]:
                dp[u] = candidate_u
                updated = true
        if not updated:
            break   // converged early

    return dp[t]

Why Correctness Requires V−1 Passes

Consider a graph with edges processed in the wrong order in a single pass:

Nodes: A, B, C    Source: A
Edges (sorted by weight): (B-C, w=1), (A-B, w=2)

Pass 1, edge (B-C, w=1): dp[C] = max(dp[B], 1) = max(∞, 1) = ∞  (dp[B] not yet set)
Pass 1, edge (A-B, w=2): dp[B] = max(dp[A], 2) = max(0, 2) = 2   ✓
→ After pass 1: dp[B]=2, dp[C]=∞   WRONG (correct dp[C] = 2)

Pass 2, edge (B-C, w=1): dp[C] = max(dp[B], 1) = max(2, 1) = 2   ✓
→ After pass 2: dp[C] = 2   CORRECT

A second pass is required because the minimax path A→B→C is only discovered once B's value is set. In general, a path of k edges requires k passes to propagate correctly from the source. Since simple paths have at most V−1 edges, V−1 passes guarantee convergence.

Complexity Analysis

	Cost
Preprocess	O(E log E) for sorting
Space	O(V + E)
Query	O(VE) per query, O(E) best case with early exit

The query is much slower than modified Dijkstra (O(E log V)) or binary lifting (O(log V)), but this is the natural DP formulation with explicit state-and-transition structure. Its main theoretical value is that it extends cleanly to directed graphs and handles negative weights (neither binary lifting nor MST-based approaches work for directed minimax).

Binary Search on the Answer Space

Key insight: The answer must be one of the E distinct edge weights. So instead of solving a continuous optimisation, we can search over the sorted edge weights for the minimum threshold that still connects s to t.

The goal is to solve the reverse problem. Instead of finding query(u,v) directly, we find valid(u, v, cost) which is a boolean that is true if it is possible to go from u to v while staying under cost cost.

preprocess(graph):
    extract all edges with weights → edge_list
    sort edge_list by weight ascending → sorted_edges[0 .. E-1]
    store the full adjacency list for connectivity checks

queryMinimax(s, t):
    lo = 0
    hi = E - 1

    while lo < hi:
        mid = (lo + hi) / 2

        // Is s reachable from t using only edges
        // with weight ≤ sorted_edges[mid].weight?
        reachable = BFS or DFS on graph,
                    ignoring edges heavier than sorted_edges[mid].weight,
                    starting at s, looking for t

        if reachable:
            hi = mid          // answer is ≤ this threshold; try smaller
        else:
            lo = mid + 1      // threshold too small; try larger

    return sorted_edges[lo].weight

Correctness: We are binary-searching for the leftmost index in sorted_edges where s and t become connected. At that index, the weight of the edge is the minimax bottleneck.

Complexity:

Preprocess: O(E log E) for sorting; O(E) space.
Query: O(log E) iterations × O(V + E) per BFS = O((V + E) log E) per query.

This is the entry point to the "search" family of solutions. The structure is clean but per-query cost is high.

UFDS Snapshot Table

Key insight: Rather than re-running BFS for every binary search step, we can precompute the connectivity structure at every stage of Kruskal's algorithm and store it. Then a query becomes a binary search over those snapshots.

This is just the binary search algorithm but with a better data structure.

preprocess(graph):
    sort edges ascending by weight → sorted_edges[0 .. E-1]
    initialise UFDS with V nodes

    // comp[v][k] = representative of v's component
    //              after the first k edges have been added
    allocate comp[V][E]

    for k = 0 to E-1:
        (u, v, w) = sorted_edges[k]
        union(u, v)
        for each node x in 0 .. V-1:
            comp[x][k] = find(x)       // snapshot the whole UFDS

queryMinimax(s, t):
    if s == t: return 0

    // Binary search for first k where s and t share a component
    lo = 0, hi = E - 1
    while lo < hi:
        mid = (lo + hi) / 2
        if comp[s][mid] == comp[t][mid]:
            hi = mid
        else:
            lo = mid + 1

    return sorted_edges[lo].weight

Complexity:

Preprocess: O(E log E) sort + O(VE) to fill the snapshot table → O(VE) time and O(VE) space.
Query: O(log E) — a simple binary search over two rows of the table.

Note on path compression: For snapshots to be consistent, you should call find with path compression after each union to get stable representatives. In practice you use union-by-rank too.

MST + Binary Lifting

Key insight: The DFS-per-query approach visits redundant nodes repeatedly. Binary lifting lets us skip 2^k ancestors at a time, answering "max edge on a root-to-node path segment" in O(log V) jumps. To handle arbitrary s→t queries, we reduce to an LCA query.

What is an LCA?

Given a rooted tree T, a node a is an ancestor of node b if a lies on the unique path from the root to b. The Lowest Common Ancestor of two nodes u and v, written LCA(u, v), is the deepest node that is simultaneously an ancestor of both.

           1 (root, depth 0)
          / \
         2   3  (depth 1)
        / \    \
       4   5    6  (depth 2)
          / \
         7   8  (depth 3)

LCA(7, 8) = 5 — both share 5 as deepest common ancestor
LCA(7, 6) = 1 — the only common ancestor is the root
LCA(4, 8) = 2 — 2 is the deepest common ancestor
LCA(5, 2) = 2 — because 5 is a descendant of 2, so the LCA is 2 itself

Critical structural fact: In any tree, the unique path from u to v passes through LCA(u,v). Specifically:

path(u, v) = path(u → LCA(u,v)) + path(LCA(u,v) → v)

Naive LCA — O(V) Per Query

The simplest approach: from each node, keep climbing toward the root until the paths meet.

naiveLCA(u, v):
    // Equalise depths
    while depth[u] > depth[v]: u = parent[u]
    while depth[v] > depth[u]: v = parent[v]
    // Climb together
    while u ≠ v:
        u = parent[u]
        v = parent[v]
    return u

This is O(depth of tree) = O(V) in the worst case (a degenerate chain tree).

Euler Tour + RMQ Reduction — O(1) Query

A fundamentally different approach: encode LCA as a Range Minimum Query on the Euler tour.

Perform a DFS, recording every node you visit (including re-visits when backtracking): this produces an array of length 2V−1 called the Euler tour.
Record the depth of each visited node in this tour.
Also record the first occurrence of each node in the tour.

Key lemma: LCA(u, v) = the node with minimum depth in the Euler tour between positions first[u] and first[v].

Intuition: During the DFS, you visit u, eventually recurse into all of u's subtree, then backtrack. The shallowest node you visit between u and v in the tour is their LCA — because you must pass through it going up on the way from u's subtree to v's subtree.

Example Euler tour for the tree above:
  1, 2, 4, 2, 5, 7, 5, 8, 5, 2, 1, 3, 6, 3, 1
  depths:
  0, 1, 2, 1, 2, 3, 2, 3, 2, 1, 0, 1, 2, 1, 0

first[4] = 2,  first[8] = 7
Range depths[2..7] = [2,1,2,3,2,3] → minimum = 1 at index 3 → node = 2
LCA(4, 8) = 2  ✓

With a Sparse Table for static RMQ: O(V log V) preprocessing, O(1) query.

Why LCA Is Central to Minimax on the MST

Once the MST is built and rooted:

The path from u to v in the MST is unique (trees have no cycles).
That path goes exactly through LCA(u, v).
The minimax value = max edge weight on u → LCA → v.

Now, the idea is that when we traverse up a tree to find the LCA, we can use the process of Binary Lifting (see below). Usually, if we wanted to traverse a tree we would do in in \(O(V)\) time but since we are only traversing towards the root when we go towards the LCA, we can add a few shortcut edges on the way to the root.

Binary Lifting

Any integer d can be written uniquely as a sum of powers of two (its binary representation). For example, to climb 13 steps: 13 = 8 + 4 + 1, so we make 3 jumps instead of 13.

Binary lifting precomputes the table:

up[v][k] = the 2^k-th ancestor of v
maxW[v][k] = the maximum edge weight on the path from v to its 2^k-th ancestor

Building the table bottom-up:

// Base case: 1-step (2^0) ancestor
up[v][0] = parent[v]
maxW[v][0] = weight(v, parent[v])

// Inductive step: 2^k ancestors = two 2^(k-1) jumps
up[v][k] = up[ up[v][k-1] ][k-1]
maxW[v][k] = max( maxW[v][k-1], maxW[ up[v][k-1] ][k-1] )

Each entry is computed from two previously computed entries in O(1). Total entries: V × log V → O(V log V) preprocessing.

Why the Query is O(log V)

Phase 1 — Depth equalisation. Let diff = depth[s] − depth[t]. Write diff in binary. For each set bit at position k, jump s up by 2^k and accumulate the max edge weight. This takes exactly popcount(diff) ≤ log V jumps.

Phase 2 — Simultaneous climb to LCA. Now s and t are at the same depth. We scan k from high to low: if up[s][k] ≠ up[t][k], then the LCA is strictly above both up[s][k] and up[t][k], so we safely make both jumps. This binary search property ensures we make exactly O(log V) jumps total.

Analogy: The naive approach is like counting down from 1000 one at a time. Binary lifting is like binary search — you always halve the remaining distance with each operation, so you reach the target in O(log V) steps.

Why is this better than just storing all ancestor distances directly? Storing all O(V²) ancestor distances would give O(1) queries but O(V²) space. Binary lifting achieves O(log V) query with only O(V log V) space — a logarithmic compression in space with only a logarithmic cost in query time.

This allows us to write the following pseudocode:

preprocess(graph):
    run Kruskal's → MST
    root the MST at node 0; compute depth[v] for all v via DFS

    LOG = ⌈log₂(V)⌉

    // Base case: 1-step ancestor and its edge weight
    for each node v (≠ root):
        up[v][0]   = parent of v in rooted MST
        maxW[v][0] = weight of edge(v, parent(v))
    up[root][0]   = root
    maxW[root][0] = 0

    // Fill the lifting table bottom-up
    for k = 1 to LOG:
        for each node v:
            up[v][k]   = up[ up[v][k-1] ][k-1]
            maxW[v][k] = max( maxW[v][k-1],
                              maxW[ up[v][k-1] ][k-1] )

queryMinimax(s, t):
    result = 0

    // Step 1: lift the deeper node up to the same depth as the shallower
    if depth[s] < depth[t]: swap s and t
    diff = depth[s] - depth[t]
    for k = 0 to LOG:
        if (diff >> k) & 1:
            result = max(result, maxW[s][k])
            s = up[s][k]

    // If t was an ancestor of s, we're done
    if s == t: return result

    // Step 2: lift both simultaneously until one step below their LCA
    for k = LOG down to 0:
        if up[s][k] ≠ up[t][k]:
            result = max(result, maxW[s][k], maxW[t][k])
            s = up[s][k]
            t = up[t][k]

    // s and t are now children of the LCA; take one final step
    result = max(result, maxW[s][0], maxW[t][0])
    return result

Complexity:

Preprocess: O(E log E) + O(V log V) → O(E log E) time; O(V log V) space.
Query: O(log V).

This is the "production standard" approach: the preprocessing is affordable, the space is very manageable, and O(log V) queries are fast in practice.

Kruskal's Reconstruction Tree

This is perhaps the most elegant approach, and it's less commonly taught. Instead of building the MST and then doing LCA on top of it, we fold both steps into a single unified data structure during Kruskal's algorithm.

Key idea: Every time Kruskal merges two components via an edge of weight w, instead of simply linking them in the UFDS, we create a new tree node with weight w and make the two component roots its children. The leaf nodes are the original V graph nodes. This creates a complete binary tree of 2V−1 nodes (the Kruskal reconstruction tree), and the minimax query between any two original nodes u and v is simply the weight stored at their LCA in this tree.

Why does this work? The LCA of u and v in the reconstruction tree is the node created when u's and v's components first merged — that merge used the smallest edge that could connect them, which is exactly the minimax bottleneck.

preprocess(graph):
    sort edges ascending → sorted_edges[0 .. E-1]
    initialise UFDS with nodes 0 .. V-1
    next_id = V    // internal nodes will be numbered V, V+1, ...

    // Each node in the reconstruction tree has: weight, parent
    node_weight[0..V-1] = 0   // original nodes are leaves, weight = 0
    ufds_root[v] = v for each v  // which recon-tree node represents component

    for (u, v, w) in sorted_edges:
        ru = ufds_root[find(u)]
        rv = ufds_root[find(v)]
        if ru == rv: continue

        // Create new internal node
        node_weight[next_id] = w
        parent[ru] = next_id
        parent[rv] = next_id
        children[next_id] = [ru, rv]

        // The new component's representative in the recon tree is next_id
        ufds_root[union(u, v)] = next_id
        next_id++

    reconstruction_tree_root = next_id - 1

    // Preprocess standard O(V log V) binary lifting LCA
    //   on the reconstruction tree (which has 2V-1 nodes)
    preprocess_LCA(reconstruction_tree, root=reconstruction_tree_root)

queryMinimax(s, t):
    if s == t: return 0
    lca_node = LCA(s, t)   // on the reconstruction tree
    return node_weight[lca_node]

Concrete example:

Suppose we have edges sorted: (A-B, w=2), (C-D, w=3), (B-C, w=5).

Merge A and B → create node N₁ (weight=2), children={A, B}
Merge C and D → create node N₂ (weight=3), children={C, D}
Merge {A,B} and {C,D} → create node N₃ (weight=5), children={N₁, N₂}

Now: minimax(A, D) = weight(LCA(A,D)) = weight(N₃) = 5 ✓ And: minimax(A, B) = weight(LCA(A,B)) = weight(N₁) = 2 ✓

Complexity:

Preprocess: O(E log E) sort + O(V α(V)) UFDS + O(V log V) LCA → O(E log E) time; O(V log V) space.
Query: O(log V) with binary lifting LCA, or O(1) with Farach-Colton-Bender LCA (O(V) preprocessing).

The reconstruction tree is superior to plain binary lifting on the MST because the query logic is dead simple: just find the LCA and read a field. There is no edge-weight accumulation during the query itself.

Kruskal Reconstruction Tree with Heavy-Light Decomposition

Binary lifting on the MST achieves O(log V) query but requires O(V log V) space for the lifting table. Can we replace binary lifting with something that uses only O(V) space?

With the Kruskal reconstruction tree, the minimax query becomes: find LCA(s, t) in the reconstruction tree and return its stored weight. There is no range-max computation required during the query at all — the weight is already stored at the LCA node. This means we can use any O(log V)-per-query LCA algorithm, including ones that use only O(V) space.

Heavy-Light Decomposition achieves exactly this.

Heavy-Light Decomposition (HLD) for LCA — O(V) Space

Decomposition step:

For each node, define its heavy child as the child with the largest subtree (break ties arbitrarily). A heavy path (or chain) is a maximal path following heavy children downward. Every other child edge is a light edge.

Key property: Any root-to-leaf path crosses at most O(log V) light edges. Proof: Each light edge leads to a child whose subtree is strictly less than half the parent's subtree (by definition of heavy child), so the subtree size at least halves with each light edge. Starting from V nodes, we can halve at most log V times.

Store per node: chain_head[v] (top of v's chain), depth[v], parent[v]. Total: O(V) space.

LCA via HLD:

LCA_HLD(u, v):
    while chain_head[u] ≠ chain_head[v]:
        // Move the node whose chain head is deeper
        if depth[chain_head[u]] < depth[chain_head[v]]:
            swap(u, v)
        // Jump from u to the parent of its chain's head
        // (this crosses exactly one light edge)
        u = parent[chain_head[u]]

    // Now u and v are on the same chain
    // The one with smaller depth is the LCA
    if depth[u] ≤ depth[v]:
        return u
    else:
        return v

Each iteration of the while loop crosses one light edge upward. Since any path has O(log V) light edges, the loop runs O(log V) times. Each iteration is O(1). Total: O(log V) per LCA query.

The Full Algorithm

preprocess(graph):
    // Step 1: Kruskal's process to build the reconstruction tree
    sort all edges ascending → sorted_edges[0..E-1]
    initialise UFDS with nodes 0..V-1
    next_id = V   // internal nodes of reconstruction tree start here
    node_weight[0..V-1] = 0
    comp_root[v] = v for each v  // which recon-tree node represents component

    for (u, v, w) in sorted_edges:
        ru = comp_root[find(u)]
        rv = comp_root[find(v)]
        if ru == rv: continue

        // Create new internal node for this merge
        node_weight[next_id] = w
        recon_parent[ru] = next_id
        recon_parent[rv] = next_id
        recon_children[next_id] = {ru, rv}

        // The merged component is now represented by next_id
        new_root = union(u, v)     // UFDS union; returns new UFDS root
        comp_root[new_root] = next_id
        next_id++

    recon_root = next_id - 1
    // Reconstruction tree has 2V-1 nodes rooted at recon_root

    // Step 2: Heavy-Light Decomposition on the reconstruction tree
    // DFS to compute subtree sizes, depths, parents, heavy children
    DFS_subtree_sizes(recon_root, parent=-1, depth=0)

    for each node v in reconstruction tree (by DFS order):
        heavy_child[v] = child of v with largest subtree_size
                         (null if leaf)

    // Assign chain heads: follow heavy children downward
    assign_chains(recon_root, head=recon_root)
        // chain_head[v] = head of v's heavy chain

    // Space: O(V) for all arrays (reconstruction tree has 2V-1 ≈ O(V) nodes)

queryMinimax(s, t):
    if s == t: return 0
    lca_node = LCA_HLD(s, t)      // O(log V), operates on reconstruction tree
    return node_weight[lca_node]  // O(1) lookup

Complexity Analysis

Step	Time	Space
Sort edges	O(E log E)	O(E)
UFDS + reconstruction tree	O(E α(V))	O(V)
HLD preprocessing	O(V)	O(V)
Total preprocessing	O(E log E)	O(V)
Query	O(log V)	—

For sparse graphs where E = O(V): preprocessing reduces to O(V log V), matching the target exactly. The space is O(V) — better than binary lifting's O(V log V) — because HLD requires only O(1) values per node.

Summary Table

Approach	Preprocess Time	Space	Query Time
Modified SSSP (Dijkstra, per source)	O(V · E log V)	O(V+E)	O(1) per source
Floyd-Warshall ASSP	O(V³)	O(V²)	O(1)
MST + DFS per query	O(E log E)	O(V)	O(V)
Binary search on edges	O(E log E)	O(E)	O((V+E) log E)
UFDS snapshot table	O(VE)	O(VE)	O(log E)
Full answer table (V²)	O(V² + E log E)	O(V²)	O(1)
MST + Binary Lifting	O(E log E)	O(V log V)	O(log V)
Kruskal reconstruction tree + Binary Lifting	O(E log E)	O(V log V)	O(log V)
Modified Bellman-Ford DP (new)	O(E log E)	O(V+E)	O(VE)
Kruskal reconstruction tree + HLD (new)	O(E log E)	O(V)	O(log V)

The bottom row represents the Pareto frontier for the preprocess-space-query tradeoff: it achieves the same O(log V) query as binary lifting but with strictly less space. In settings where memory is the bottleneck (large V, memory-constrained systems, cache performance), this is the superior choice. The reconstruction tree + HLD is also arguably simpler to reason about correctness for, since the query reduces to a pure LCA computation with a single weight lookup rather than an accumulated max over a tree path.

Comparing the Approaches in Practice

The "right" answer depends on the regime you are operating in:

Few queries, sparse graph (competitive programming style): Modified Dijkstra or MST + DFS per query. Dijkstra with a priority queue is extremely fast in practice and needs no preprocessing.

Dense graph, all-pairs needed: Floyd-Warshall or the V² answer table. The V² table via Kruskal is faster in practice than Floyd-Warshall (O(V²) vs O(V³)) and gives the same O(1) query.

Many queries, memory constrained: MST + binary lifting. It gives O(log V) query with only O(V log V) space, which is typically a few megabytes even for V = 10⁶. In competitive programming, this is the go-to for online query problems.

Elegance and extensibility: Kruskal's reconstruction tree. This structure generalises immediately to other "merging" queries (e.g., the maximum flow between two nodes also equals the maximum-weight LCA in the reverse Kruskal tree, where edges are sorted descending). If you are building a library, this is the cleanest abstraction.

Offline query processing: The UFDS snapshot table or binary search approach works well when queries arrive offline and you can afford O(VE) preprocessing for O(log E) online answers — useful when E is small but V is large.

Binary Lifting — Missing Guard for s == t Mid-Equalisation

After equalising depths, if s == t we have already found the LCA and the max edge is whatever was accumulated in the equalisation phase. The original code only checks if s == t: return result at the end of equalisation, which is correct — but the final step result = max(result, maxW[s][0], maxW[t][0]) should only execute if s ≠ t at that point, otherwise we'd add the edge from s (= LCA) to its parent spuriously. The original code handles this because the for k = LOG down to 0 loop only advances when up[s][k] ≠ up[t][k], so if they are already equal, the loop does nothing. However, the final line outside the loop is only reached when they are one step below the LCA. The code is structurally correct but should have an explicit guard comment:

    // After the simultaneous lift, s and t are children of LCA
    // (guaranteed by the invariant of the loop above).
    // If s == t was caught before the loop, we return early.
    result = max(result, maxW[s][0], maxW[t][0])
    return result

No algorithmic change needed — just a clarification that the if s == t: return result guard mid-equalisation correctly short-circuits before the loop.