Friday, March 30, 2012

Quicksort in Python

I have been brushing up on my algorithms. One of the algorithms that I looked at was Quicksort. It is a sorting algorithm developed by Tony Hoare in 1960. If you are using the built in sort function in C, you are using quicksort.

Quicksort is one of those algorithms you study in university and then often forget. However, as I mention, I was brushing up on my algorithms and so decided to revisit quicksort. The way I usually understand algorithms is to first look at the algorithm and then look at an implementation.

Here is a quick description of the algorithm:

Quicksort:
  1. Take an array to be sorted.
  2. If the array has fewer than two values, we will consider it sorted (return).
  3. Choose a value which we will call the pivot value. This can be the first element of the array.
  4. Rearrange the array so that all values less than the pivot value are placed before the pivot value and all other values are placed after the pivot value.
  5. Run Quicksort over each of the two subarrays created in the previous step.
Here is an implemenation in C (from http://rosettacode.org/wiki/Quicksort):

void quick_sort (int *a, int n) {
    if (n < 2)
        return;
    int p = a[n / 2];
    int *l = a;
    int *r = a + n - 1;
    while (l <= r) {
        while (*l < p)
            l++;
        while (*r > p)
            r--;
        if (l <= r) {
            int t = *l;
            *l++ = *r;
            *r-- = t;
        }
    }
    quick_sort(a, r - a + 1);
    quick_sort(l, a + n - l);
}
 
int main () {
    int a[] = {4, 65, 2, -31, 0, 99, 2, 83, 782, 1};
    int n = sizeof a / sizeof a[0];
    quick_sort(a, n);
    return 0;
}

While we can see what the above code is doing, and we can map the variable names to the description from the algorithm, it is not immediately clear how the algorithm works.

In fact, if we replaced the name quick_sort by q, and asked someone what the code was doing, it would be reasonably hard for most people to guess that it was a quicksort.

The next thing I did was to look for a Python implementation and found this one (http://en.literateprograms.org/Quicksort_(Python)?printable=yes):

def qsort1(list):
    """
    Quicksort using list comprehensions
    """
    if list == []: 
        return []
    else:
        pivot = list[0]
        lesser = qsort1([x for x in list[1:] if x < pivot])
        greater = qsort1([x for x in list[1:] if x >= pivot])
        return lesser + [pivot] + greater

Immediately it is obvious what the algorithm is doing. Looking at the algorithm, though, we notice that the first line of the function:

if list == []

Looks for an empty list. This is because if we get an empty list, we can assume it is sorted. However, if we get a list with only one element, we can also assume it is sorted. I changed the lines to:

if len(list) <= 1: 
    return list

I measured the speed of the two algorithms and the second gave me a 10% improvement in speed. Would I have noticed this so easily in the C code? I am not sure. However, having noticed this in the Python code, it seemed fairly obvious that it could be improved and I also felt confident that my improvement was going to work.

So, even though Python may not be as efficient as C or some compiled languages, the clarity of the code certainly allows us to make improvements and optimisations that may otherwise have been very difficult.

No comments:

Post a Comment