in reply to Word Counting

I don't find splitting on spaces to be very good at picking out "words". If you just want to count the total number of words, then it works pretty well. But for your task, I find the quite simple:

@words= $line =~ /(\w+(?:'\w+)?)/g;
to be much more effective. It isn't perfect. If you have numbers and/or underscores in your text and you want to ignore them and/or you want to handle non-English letters, then a better version is:
@words= $line =~ /([[:alpha:]]+(?:'[[:alpha:]]+)?)/g;

These match the common contractions (like "don't", "isn't", "aren't", and "I've" that I've used) but aren't bothered by 'quoting'.

                - tye

Update: Even better, allow hyphenated-word matching:

@words= $line =~ /([[:alpha:]]+(?:[-'][[:alpha:]]+)*)/g;

Replies are listed 'Best First'.
Re: Re: Word Counting (contractions)
by Anonymous Monk on Apr 24, 2003 at 22:06 UTC
    So far I can get the total word count set by using the code below. From there I don't know how to place each word into their own hash and count per word rather than count the total number. I knew from the beginning I had to use a regex for this to be more accurate and I think I'll go with the last one you posted but using that confuses me. Would I have to change my for loop to a foreach (keys @words) { @words =~ ...} ?
    my $file = "test.txt"; my $count = "0"; open (FILE, $file) or die "Error $!"; my $words = <FILE>; $count++ for split /\s+/, $words; print "Count: $count\n"; print "Words: $words\n"; close FILE;
      I suspect a few people may be giving up in head-banging-on-the-desk frustration by now..., so I'll give it a shot. :)

      You're so nearly there...you just need to replace a single line there with the line that perlplexer already gave you, which loops over the list from 'split' (using 'for' or 'foreach' - they're synonymous, so you use whichever one you think looks nicer - don't you love a language that takes aesthetics into account?) and increments the value in a corresponding 'words' hash. I'm not going to tell you though which line that is I'm afraid, 'cos methinks the student protesteth overmuch about this not being homework...I prescribe a healthy dose of Camel :)

      Cheers,
      Ben