mudeltasigma: 2010

Monday, December 13, 2010

Python - Batteries included

It has often been said that Python comes with batteries included. This refers to the fact that it has a rich set of utilities included in the standard library. In the following program, the main program is a single line of code. It was made possible partly by the functions included in the random module. If we had made the specifications a bit tighter, we could have reduced the number of lines significantly.

Here is what the program does: We would like to generate passwords (or some other random string), but we want to be able to specify a template. For example, we may want to specify that each password is 7 lowercase characters. In this case we would use the template 'aaaaaaa'. Suppose we wanted a password that looks like a Victorian (Victoria, Australia) number plate, we would specify the template 'AAA-999', in other words, three random letters followed by a hyphen followed by three random digits. Finally, we may want to generate a simple password that is easy to remember by making it pronouncable - so we generate it using the following template: 'Cvcvc' The 'c' will be replaced by a consonant and the 'v' by a vowel, giving us something like, maybe, 'Gamir'.

Here is the program:

import string
import random

vowels_upper = 'AEIOU'
vowels_lower = 'aeiou'
consonants_upper = 'BCDFGHJKLMNPQRSTVWXYZ'
consonants_lower = 'bcdfghjklmnpqrstvwxyz'
alphanumeric = string.letters + string.digits
recognize = {
'A': string.ascii_uppercase, 'a': string.ascii_lowercase,
'C': consonants_upper, 'c': consonants_lower, 
'V': vowels_upper, 'v': vowels_lower, 
'd': string.digits, 'x': alphanumeric}

def create_password(template):
return ''.join([random.choice(recognize[ch]) 
if ch in recognize else ch for ch in template])

if __name__ == '__main__':
for i in range(5):
print create_password('AAA-ddd')

for i in range(5):
print create_password('Cvcvc')

When we run the above program, we get the following output:

MOI-138
KMQ-882
UCH-536
LBK-734
MGA-798
Wewof
Roqug
Colud
Gokix
Berid

That's quite a lot of power for what is essentially one line of code.

Saturday, December 11, 2010

Phononyms in Python

If you use a mobile phone to send SMSs, and you use predictive text, you will probably notice that sometimes you type a word, e.g. 'HOME' and the predictive text thinks you are trying to type 'GOOD' since both HOME and GOOD are represented by 4663.

Recently a friend at work was suggesting a good name for these words that are equivalent on a mobile phone would be Nokianym. I think that I came up with phononym and found that other people on the internet had already thought of the same term.

Anyway, for me, coming up with a Python program to find all such words in a dictionary was a more interesting problem. So, here's my Python script that, given a dictionary and the layout of a keyboard, will find all words that have an equivalent numerical representation.

dictionary = '/usr/share/dict/words'

nokia = {'a':2, 'b':2, 'c':2,
    'd':3, 'e':3, 'f':3,
    'g':4, 'h':4, 'i':4,
    'j':5, 'k':5, 'l':5,
    'm':6, 'n':6, 'o':6,
    'p':7, 'q':7, 'r':7, 's':7,
    't':8, 'u':8, 'v':8,
    'w':9, 'x':9, 'y':9, 'z':9
}

def get_val(word):
    result = 0
    word = word.lower()
    for c in word:
        if nokia.has_key(c):
            result = result * 10 + nokia[c]
    return result        

phononyms = {}
for word in open(dictionary):
    word = word.rstrip('\n\r')
    val = get_val(word)
    if (not phononyms.has_key(val)):
        phononyms[val] = [] 
        phononyms[val].append(word)

print [phononyms[val] for val in phononyms.keys() if len(phononyms[val]) > 1]

UNIX and Python - a marriage of convenience

Recently we were having a discussion about how many Friday 13ths there were in a particular year (or which year had the most, or something).

I thought it should be easy to write a solution in Python, so I started with the following code:



import datetime
from datetime import timedelta

start = datetime.date(2009, 1, 1)
end = datetime.date(2013, 1, 1)

while start < end:
        print start.strftime('%Y-%m-%d %A')
        start = start + timedelta(1)

This little program prints out all the dates from 2009-01-01 to 2012-12-31 in the following format:


2009-01-01 Thursday
2009-01-02 Friday
2009-01-03 Saturday
2009-01-04 Sunday
2009-01-05 Monday
...

I could then have added some more code to determine if a date was a Friday 13th and printed it - but decided not to. I had a general purpose tool that printed a calendar, so all I needed to do was the following:


python printcal | grep '13 Friday'

Which then gives the following:


2009-02-13 Friday
2009-03-13 Friday
2009-11-13 Friday
2010-08-13 Friday
2011-05-13 Friday
2012-01-13 Friday
2012-04-13 Friday
2012-07-13 Friday

So, why did I prefer to do this rather than modify my Python script. Let's look at the Unix Philosophy

This is the Unix philosophy: Write programs that do one thing and do it well. Write programs to work together. Write programs to handle text streams, because that is a universal interface.

Doug McIlroy

A second part of the philosophy from Mike Gancarz is "Look for the 90-percent solution."

So, I have created a small tool that prints all the dates in a period (this is the 90-percent solution). It is easy to verify that it is doing what I want to do. I use the output of this program as the input to the standard unix program grep that finds all the output that contains Friday 13. Suppose I now want to find all years that have at 3 months where the first of the month is a Monday, I can do it easily by modifying the grep part of the pipeline. If I want to see how many Monday 5ths we had in 2008, I can pipe the output of grep to wc -l which will print the number of lines output.

Saturday, November 27, 2010

Saving the planet one light at a time

I have just written my latest game and posted it to my site.

This game is my latest in an attempt to teach myself and get practice in using HTML5, JavaScript and Canvas. It is not an original game and is described quite well here. The game is similar to the last one, so did not take much extra effort.

As usual, the things that took the most effort were the small things. In particular getting the bevel correct on the lights and getting the radial gradients working correctly. The documentation that I found on radial gradient was not particularly helpful in helping me understand how it works. I have left all the source code unminified which will hopefully help anyone else trying to get radial gradients to work in JavaScript.

Sunday, November 21, 2010

Finding words with Primes

A long time ago I wrote a simple game. The current version of the game is here: wordgame. The current version was to give me a chance to play with canvas and Ajax.

Anyway, I wrote the original version a long time ago - and what I needed to do was find all words in a dictionary that could be made from the 9-letter grid. The original version was written in C and at the time the only option was to work with arrays (no hashes, STL, sets etc to help).

After a few attempts, I "discovered" the idea of using prime numbers. I have since noticed that this idea is not original. In fact it is similar to Godel numbering. Today I came across this blog entry google-interviewing-story. One of the comments mentioned an idea that should be more efficient than using primes. The idea is to use two arrays. We will assume we have a 'secret' word and a dictionary full of words we want to search. Our first array contains the frequency of letters in our secret word and the second array is the frequency of the letters in the word we are checking. Here is the python code for the prime method:


dictionary = '../scrabble.txt'
secret = 'chocolate'
primes = [2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31,
     37, 41, 43, 47, 53, 59, 61, 67, 71, 73,
     79, 83, 89, 97, 101]

def prime_val(ch):
     return primes[ord(ch.lower()) - ord('a')]

def get_val(word):
     total = 1
     for ch in word:
             total *= prime_val(ch)
     return total

magic = get_val(secret)
for word in open(dictionary):
     word = word.rstrip('\n')
     if magic % get_val(word) == 0:
             print word

Here is the code using arrays:


#!/usr/bin/python

dictionary = '../scrabble.txt'
secret = 'chocolate'

def xord(ch):
 return ord(ch) - ord('a')

def get_word_array(word):
 word_array = [0] * 26
 for letter in word:
     word_array[xord(letter)] += 1
 return word_array


magic_array = get_word_array('chocolate')

for word in open(dictionary):
 word = word.rstrip('\n')
 word_array = get_word_array(word)
 magic_clone = magic_array[:]

 if min([magic_clone[i] - word_array[i] for i in range(26)]):
     print word

So, which method was faster?

Here are the timings (on my iMac)

First program using primes:


real    0m1.391s
user    0m1.355s
sys    0m0.020s

The more efficient program:


real    0m3.569s
user    0m3.509s
sys    0m0.024s

So, the first program would appear to be noticeably quicker when implemented in Python. There was a good argument why the second program should be quicker.

I guess that it comes down to the language used for the implementation and the specifics of the problem. If this story has a moral, it would probably be this: Never assume that one algorithm will be more efficient than another when applied to a specific problem with a specific language.