Quick Sort

Subpage of Algorithms

Recurrences, asymptotics, and design techniques.

Quicksort is one of the most popular algorithms of all time because. While in running time it performs the same as merge sort, it sorts in place so does not come with significant space overhead.

However, the algorithm is not trivial and requires detailed analysis to understand why it reaches \(O(N \log{N})\) run time.

Quicksort is a divide and conquer algorithm where the divide step is non-trivial. However, this makes the merge sort algorithm simple.

Partitioning

At its core, it's beautifully simple: take some collection of elements and rearrange them so that elements satisfying some property end up on one side, and elements not satisfying it end up on the other.

Think of partitioning as a filtering operation that's done in-place. You're not creating new arrays; you're cleverly rearranging elements using the space you already have.

The Lomuto partition scheme is easiest to understand. You maintain a boundary between "small" and "unprocessed" elements:

partition(A, low, high):
    pivot = A[high]
    i = low - 1  // boundary of small elements
    
    for j from low to high-1:
        if A[j] <= pivot:
            i++
            swap A[i] with A[j]
    
    swap A[i+1] with A[high]
    return i+1

The partitioning algorithm always returns the number of elements smaller than the pivot.

The invariant is key: at every step, everything from low to i is ≤ pivot, everything from i+1 to j-1 is > pivot, and everything from j onward is unprocessed. The partitioning algorithm is one of the best illustration of the idea of invariants.

Note how we maintain two pointers: i and j so that we can rearrange elements within the space we have.

i → Boundary between lower and higher

j → The next element we want to add into the partition

Example Problems

Partitioning is the secret weapon behind quicksort, one of the fastest practical sorting algorithms.

It's central to selection algorithms that find the kth smallest element in linear time.

It appears in computational geometry, string algorithms, database query optimization, and even in how operating systems manage memory.

Linear Time Selection

This is a key application of partitioning. Refer to Quickselect .

Three way partitioning

This is incredibly useful when the data we are working with has a lot of duplicates (for example, sorting a list with 4 keys). This is to make sure the partition is effective.

This approach partitions the array and then partitions one of the two partitions produced.

Quicksort

Quicksort employs a straightforward use of partitioning to choose a point at which to divide the array in the process of implementing divide-and-conquer.

Space Complexity

Since this is a recursive algorithm, the space complexity depends on the depth of the call stack. At each call stack, we use constant space because all we need to do is swap elements.

Thus, the complexity is the depth of the call stack which is in the worst case \(O(n)\).

Time Complexity

This is where things get interesting!

If we make the assumption that we have the ideal scenario where our pivot will always split the array in half, we get the following recurrence:

\[T(n)=2 \times T(n/2)+O(n)\]

The \(O(n)\) comes from the partitioning algorithm which is run on the input array to get the point at which the division is to be done. This is the same as the merge sort recurrence (although in that case, \(O(n)\) comes from merging instead of partitioning) leading to the runtime of \(O(n \log n)\).

However, this is based on an assumption: its hard to expect that this assumption will always be true.

In the worst case, where the pivot is always the smallest/largest element, we are going to have the following recurrence:

\[T(n)=T(n-1)+O(n)\]

This leads to a complexity of \(O(n^2)\) (easiest seen with the help of a recursion tree). This is bad.

So how can we proceed? First, we have to notice that we don’t need to get a perfect 50-50 split around the pivot. Consider a split such that \(1/3\) of the values. In this case, we can write the recurrence as follows:

\[T(n)=T(n/3)+T(2n/3)+O(n)\]

Now, we can solve this using either Akra-Bazzi or a recursion tree to yield a runtime of \(O(n \log n)\).

Our solution to reach this bound is a randomized algorithm. We can’t determine the actual running time in this case but rather we deal with expected running time: the running time we get on most cases, as corrected by the distribution of cases.

Here is how the paranoid partitioning algorithm works: keep randomly selecting an element in the sub-array arr[left, right) to use as a pivot and run i = Partition(arr, left, right) until the pivot placement i is within the 30th and 70th percentile of arr[left, right).

Now, we can show that we can use a pivoting algorithm that runs in \(O(n)\) time (when working with Expected Values) to get a pivot that is in the middle 40 percentile of the dataset.

The proof can be obtained by starting by considering the expected value for the number of trials before a Bernoulli experiment produces a particular value.

Now, we know that this approximately, if this is true (as per the above recurrence), then we have an algorithm with \(O(n \log n)\) runtime.