I don't find splitting on spaces to be very good at picking out "words". If you just want to count the total number of words, then it works pretty well. But for your task, I find the quite simple:
to be much more effective. It isn't perfect. If you have numbers and/or underscores in your text and you want to ignore them and/or you want to handle non-English letters, then a better version is:@words= $line =~ /(\w+(?:'\w+)?)/g;
@words= $line =~ /([[:alpha:]]+(?:'[[:alpha:]]+)?)/g;
These match the common contractions (like "don't", "isn't", "aren't", and "I've" that I've used) but aren't bothered by 'quoting'.
- tyeUpdate: Even better, allow hyphenated-word matching:
@words= $line =~ /([[:alpha:]]+(?:[-'][[:alpha:]]+)*)/g;
In reply to Re: Word Counting (contractions)
by tye
in thread Word Counting
by Anonymous Monk
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |