Wierd. I wrote one yesterday as well but found it didn't work very
well at all unless the ratio of spam to legit messages in the training set
were relatively close to 1:1. For a corpus I used just the 'bare'
set in:
http://www.iit.demokritos.gr/~ionandr/lingspam_public.tar.gz.