Saturday, June 4, 2011

Sorting without sorting

It is Sunday, the children have gone to visit their grandparents and I should be writing an essay this afternoon for my Graduate Certificate in management.

Instead I was browsing the web and came across a Kata by Dave Thomas from Pragmatic
Programmers.

The idea of Katas is that they are exercises that allow programmers to keep
their skills in tune by continually solving small programming (or sometimes
other) problems. In this particular kata, Dave offers the following problem:

Our resident conspiracy expert, Dr. X, is looking for hidden messages in the
collected publications of Hugh Hefner. Dr. X believes the message is hidden in
the individual letters, so rather than get distracted by the words, he’s asked
us to write a program to take a block of text and return the letters it
contains, sorted. Given the text:

When not studying nuclear physics, Bambi likes to play
beach volleyball.


our program would return:

aaaaabbbbcccdeeeeeghhhiiiiklllllllmnnnnooopprsssstttuuvwyyyy


The program ignores punctuation, and maps upper case to lower case.

Are there any ways to perform this sort cheaply, and without using built-in libraries?


I decided to try a solution quickly in Python and came up with the following
solution:

sentence = '''
   When not studying nuclear physics, Bambi likes to play
   beach volleyball."
'''

frequency = [0] * 26

def increment_frequency(letter):
    frequency[ord(letter) - ord('a')] += 1

[increment_frequency(letter) for letter in sentence.lower() if letter >= 'a' 
    and letter <= 'z']

print ''.join([chr(i + ord('a')) * frequency[i] for i in
    range(len(frequency))])


What I have done is to create an array of 26 elements, one for each letter of the alphabet, and each time I see a letter, I increment the count for that letter. In other words, we don't sort the letters, we just keep track of the frequency of each letter. Finally, we print out the contents of the array as a histogram, using the letter to represent the bar.

I did not come up with the solution - I am sure that I have seen it before. Probably more than once. But I think doing the katas is a good way to keep that sort of thing in front of the mind. Also, while doing it, I made the mistake that I often make of using iter.join(str), rather than using str.join(iter).

Okay, now on to my essay...