in reply to stripped punctuation

My basic approach would be to remove all punctuation, or better: All non-word characters and start counting then:
s/[^a-z]/ /g; @words= split ' ';
Of course this does not take into account:
  1. words that are broken at a line-
    end ;-)
  2. foreign language characters
  3. your definition of a word. Maybe you render ab4711xya word. With this it will be 2.

$\=~s;s*.*;q^|D9JYJ^^qq^\//\\\///^;ex;print

Replies are listed 'Best First'.
Re^2: stripped punctuation
by Nkuvu (Priest) on Oct 06, 2005 at 21:09 UTC
    This also breaks for simple, commonly used words. Like "don't." That's two words for you right there. Or "co-worker." Or...

    I'd suggest that the simplistic approach would be better off by splitting on white space first, then simply removing non-word characters (and I'd use the [:alpha:] designation as opposed to a-z).