This also breaks for simple, commonly used words. Like "don't." That's two words for you right there. Or "co-worker." Or...
I'd suggest that the simplistic approach would be better off by splitting on white space first, then simply removing non-word characters (and I'd use the [:alpha:] designation as opposed to a-z).