in reply to Word Counting
I don't find splitting on spaces to be very good at picking out "words". If you just want to count the total number of words, then it works pretty well. But for your task, I find the quite simple:
to be much more effective. It isn't perfect. If you have numbers and/or underscores in your text and you want to ignore them and/or you want to handle non-English letters, then a better version is:@words= $line =~ /(\w+(?:'\w+)?)/g;
@words= $line =~ /([[:alpha:]]+(?:'[[:alpha:]]+)?)/g;
These match the common contractions (like "don't", "isn't", "aren't", and "I've" that I've used) but aren't bothered by 'quoting'.
- tyeUpdate: Even better, allow hyphenated-word matching:
@words= $line =~ /([[:alpha:]]+(?:[-'][[:alpha:]]+)*)/g;
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: Re: Word Counting (contractions)
by Anonymous Monk on Apr 24, 2003 at 22:06 UTC | |
by benn (Vicar) on Apr 24, 2003 at 23:40 UTC |