A minimum spanning tree (MST) is defined as a spanning tree that has the minimum weight among all the possible connected, undirected graphs that include all the vertices of the graph.

Finding the MST is a graph is a good example of an application of greedy algorithms. This is because almost any greedy algorithm is guaranteed to work for this problem. But as is the theme in the design of greedy algorithms, we still need a proof to guarantee correctness.

For proofs, a useful result that can be used to prove that an MST is reached is related to the idea of The Minimum Bottleneck Property (MBP).

An edge \((v, w)\) satisfies the MBP if and only if there is no \(v-w\) path consisting solely of edges with weight less than the weight of edge \((v, w)\).

Now, this is useful because MBP implies MST. If every edge of a spanning tree \(T\) satisfies the minimum bottleneck property, \(T\) is a Minimum Spanning Tree (MST).

For the algorithms to determine the Minimum Spanning Tree, a general outline of their proof involves showing that they produce:

A spanning tree
An approach relying on proof by cases that classifies edge additions into two types (one that creates a cycle and one that does not) can help with this proof.
The tree satisfies the MBP (this can be shown using an exchange argument)

This automatically implies that we have a MST.

Here are a few more properties useful to reason about MSTs:

Cut property. Let S be any subset of nodes, and let e be the min cost edge with exactly one endpoint in S. Then the MST contains e.
Cycle property. Let C be any cycle, and let f be the max cost edge belonging to C. Then the MST does not contain f.

Prim’s algorithm

Prim's grows a single tree from a starting node. The naïve algorithm runs in \(O(mn)\).

Looking at Prim’s algorithm, we notice that the main operation that dominates the runtime is finding the shortest edge that can be added to the tree. Whenever we need to find the element with the smallest key rapidly, we know that we can use a heap.

The only question that remains is whether we want to use a heap of edges or a heap of vertices. These two approaches have their pros and cons.

Heap of Edges

Idea: Maintain a min-heap (priority queue) of all candidate edges that connect the current MST to vertices outside it.

Process:

Initialise an empty BBST/Heap incident_edges.
Initialise an array in_tree[] set to all false. in_tree[s] = true.
for all (n, weight) in Graph.get_neighbours(s):
	incident_edges.insert(weight, (s, n))
while incident_edges is not empty:
	remove the minimum element from incident_edges as (a, b)
	if in_tree[a] and in_tree[b]:
		continue
	in_tree[b] = true; add (a, b) into the MST.
	for all (n, weight) in Graph.get_neighbours(b):
		incident_edges.insert(weight, (b, n))

Heap Size: Can grow up to \(O(E)\), since potentially all edges may be pushed in.
Operations: Each edge may be inserted once and removed once.

Complexity:

Time: \(O(E \log E)\) (since each of the \(E\) edges can be pushed/popped once). For a simple weighted graph, this simplifies to \(O(E \log V)\) because \(\log E = O(\log V^2) = O(2 \log V)\) [based on reasoning about the maximum number of edges that a simple graph can have].
Space: \(O(E)\) for the heap.

Heap of Vertices

Idea: Maintain a min-heap keyed by the minimum edge weight connecting each vertex to the current MST.
Process:
- Start with an arbitrary vertex, set its key to 0, others to \(\infty\).
- Extract the minimum-key vertex from the heap.
- For each neighbour, if the edge weight is smaller than the current key, update the key (decrease-key operation).
Heap Size: Always exactly \(O(V)\), since each vertex is stored once.
Operations: Each vertex is extracted once, and each edge may cause a decrease-key operation.

Complexity:

Time: \(O((V + E) \log V)\).
- \(V\) extract-min operations → \(O(V \log V)\).
- Up to \(E\) decrease-key operations → \(O(E \log V)\).
- Total: \(O((V + E) \log V)\).
Space: (O(V)) for the heap.

Choosing between the two

Edge-based heap is simpler to implement but less space-efficient, especially in dense graphs where \(E \approx V^2\).
Vertex-based heap is more elegant and efficient in terms of space, and it’s the version most textbooks and libraries adopt.
For sparse graphs \((E \approx V)\), both approaches are similar in performance.
For dense graphs \((E \approx V^2)\), the vertex-based heap is significantly better:
- Edge-based: \(O(V^2 \log V)\)
- Vertex-based: \(O(V^2 \log V)\), but often simplified to \(O(V^2)\) with adjacency matrix.

Fibonacci Heaps

With a Fibonacci heap, we can achieve \(O(E + V \log V)\) time complexity (but with a large constant). Here is how we achieve the runtime:

Insert, Delete, and DecreaseKey in a Fibonacci Heap is \(O(1)\) amortized time.

ExtractMin in a Fibonacci Heap is \(O(\log n)\) amortized time.

We can work out the complexity using the following breakdown of Prim’s algorithm:

\(O(V)\) inserts so \(O(V)\) time.

\(O(E)\) decrease key updates so \(O(E)\) time.

\(O(V)\) extractMin operations so \(O(V \log V)\) time

This is the general approach to run Prim as fast as possible on a general graph.

Kruskal’s algorithm

Kruskal's adds the smallest available edge (sorting edges first).

The idea behind this algorithm is to greedily add edges to the solution. To maintain the edges we have seen, the most efficient approach is to use UFDS. Here is what the pseudocode for the basic implementation of Kruskal looks like:

Initialise a UFDS forest for n nodes.
Sort the edge list based on weight.
Create an empty edge set T.
For edge (u, v, w) in the edge list:
	If forest.is_same_set(u, v): // connecting will form a cycle
		continue // don’t add the edge into T
	Else:
		Add (u, v, w) into T, forest.union(u, v)

This runs in \(O(m \log m)\) time because the most of the time is dominated by sorting the edge list.

Note: If there are a lot of parallel edges in the graph, we can run a \(O(m)\) traversal through the graph to eliminate parallel edges so that the cost of sorting the edge list can be brought down.

Example problems

One of the most useful applications of Kruskal’s algorithm comes in clustering.

Clustering is essentially Kruskal’s algorithm stopped early. The reason why we make the connection between Kruskal and Clustering is that we have discussed strategies to turn Kruskal into a blazingly fast algorithm. We can apply the same principles to clustering to achieve a speed-up.

Boruvka’s algorithm

This is an algorithm that is very fast to do by hand.

Borůvka's algorithm is a greedy algorithm for finding a minimum spanning tree in a graph, or a minimum spanning forest in the case of a graph that is not connected. It was first published in 1926 by Otakar Borůvka as a method of constructing an efficient electricity network for Moravia.

Here is how it works:

1) Input is a connected, weighted and un-directed graph.
2) Initialize all vertices as individual components (or sets).
3) Initialize MST as empty.
4) While there are more than one components, do following
   for each component.
      a)  Find the closest weight edge that connects this
          component to any other component.
      b)  Add this closest edge to MST if not already added.
5) Return MST.

Time Complexity of Boruvka's algorithm is \(O(E \log V)\) which is the same as Kruskal's and Prim's algorithms.

Going Faster

Here, we are going to look at approaches that can help us go even faster than this. We do this for both approaches:

Kruskal’s Algorithm

The majority of the cost comes from sorting. However, we know that sorting can be done faster under the assumption that weights are integers and bounded.

Using Counting sort, we can do the sort in \(O(n)\) time.

This makes the algorithm dominated by the loop which runs in \(O(m \alpha (n))\) time. Overall, the time complexity comes up to \(O(c + m \alpha (n))\) time where \(c\) is the maximum integer edge weights.

Prim’s Algorithm

Here, the use of a priority queue to visit edges in some ordering is the main slowdown. Those data structures don’t assume any fixed range of values so we can use an approach to speed it up.

One approach to do this is to use a set of \(c\) stacks/queues/linked-lists which store all edges of weight \(w\) in the \(w^{th}\) bucket.

This is a standard replacement for a priority queue for a set of bounded integers.

We can do retrieval in \(O(c)\) time by simply traversing the buckets.

This is how that changes the cost of the algorithms:

Mark all nodes as not in tree, except for node s.
Create an array fake_pq[] of c empty queues.
For each edge (n, weight) in G.get_neighbours(s)
	Add node n into fake_pq[]
while we still haven’t added n - 1 edges, fake_pq[] has a non-empty queue:
	Get the next minimum node curr_node out of fake_pq[]
	If curr_node is already marked as in tree, continue.
	Mark curr_node as in tree
	For each edge (n, weight) in G.get_neighbours(curr_node)
		Add node n into fake_pq[weight]

Here is how we break down the cost of the algorithm:

Creating the array takes \(O(c)\) time
We run the loop \(n - 1\) times so the extraction of the element from fake_pq takes \(O(nc)\) time overall
The time taken to add all the edges to the fake_pq overall is \(O(m)\)

This leads to an overall time complexity of \(O(cn+m)\)

But can we go even faster?

Example Problems

Minimum Bottleneck Spanning Tree

The minimum bottleneck spanning tree in an undirected graph is a tree whose most expensive edge is as small as possible.

A bottleneck in a spanning tree is the maximum weight edge present in the tree. Now, it turns out that every Minimum Spanning Tree is a Minimum Bottleneck Spanning Tree.

This is useful in minimum bottleneck problems: given weighted undirected graph, for queries ((s,t)) find a path minimizing the maximum edge weight along the path (minimize the bottleneck).

Here is the approach:

Build an MST of the graph. The bottleneck value between any two nodes equals the maximum edge on their path in the MST.

So preprocess the MST for LCA with max edge on path. Each query returns that max value in \(O(\log n)\).

Among all paths between two nodes, the path in the MST minimizes the maximum edge weight; MST encodes minimax connectivity. This converts many bottleneck or reliability queries into simple MST path queries.

Second‑best Minimum Spanning Tree

Given a connected weighted undirected graph, find the weight of the second‑smallest spanning tree (the spanning tree with the next smallest total weight after the MST).

Modification / Trick: Build the MST (Kruskal). Preprocess the MST for max edge on path queries (binary lifting / LCA storing the maximum and second‑maximum edge on each ancestor jump).
Approach: For every non‑MST edge ((u,v,w)), consider replacing the maximum edge on the MST path between (u) and (v) with this edge; compute the new total weight = MST_weight − max_on_path + w. Track the minimum such value strictly greater than MST_weight.
Why MST helps: The MST is the baseline; any other spanning tree differs by swapping in one non‑MST edge and removing an edge on the unique MST path. Preprocessing the MST makes each candidate check (O(\log n)).
Complexity: Sort edges + DSU for MST, then (O(m\log n)) checks with LCA preprocessing (O(n\log n)).
Insight: This is a small, powerful extension: you reuse MST structure and add path‑max queries to enumerate optimal single‑edge swaps.

Max‑spacing k‑Clustering (Kruskal clustering)

Partition vertices into k clusters to maximize the minimum distance between clusters (maximise spacing).

Modification / Trick: Run Kruskal’s MST but stop when there are exactly k connected components (i.e., stop after adding (n-k) edges). The next smallest edge weight that would connect two different components is the spacing.
Approach: Sort edges and union until you have k components. The answer is the smallest edge weight that connects two different components at that point.
Why MST helps: Kruskal greedily builds components in the same order as optimal clustering; stopping early yields the optimal partition for max‑spacing.
Complexity: (O(m\log m)) dominated by sorting.
Insight: No heavy clustering algorithm needed—just stop Kruskal early and read off the spacing.

Powerplants and Houses (Minimum Spanning Forest)

Problem: You have a set of powerplants and houses. You want to connect every house to at least one powerplant with wires at minimum cost.
Modification: Normally MST connects all nodes into one tree. Here, we want multiple trees (a forest), each rooted at a powerplant.
Approach:
- Add a super node connected to all powerplants with edges of weight 0.
- Run MST (e.g., Kruskal or Prim).
- The resulting MST ensures every house is connected to some powerplant, and the zero-weight edges allow multiple components to exist.
Insight: This is essentially a minimum spanning forest problem disguised as MST with an extra node.

There is another way to solve this problem. This is to use an MST algorithm but to add the power-plants all to the same component before starting.

Minimum Spanning Trees

Prim’s algorithm

Heap of Edges

Heap of Vertices

Choosing between the two

Fibonacci Heaps

Kruskal’s algorithm

Example problems

Boruvka’s algorithm

Going Faster

Kruskal’s Algorithm

Prim’s Algorithm

Example Problems

Minimum Bottleneck Spanning Tree

Second‑best Minimum Spanning Tree

Max‑spacing k‑Clustering (Kruskal clustering)

Powerplants and Houses (Minimum Spanning Forest)