How do I just count any words from a file?

Cazbo has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: How do I just count any words from a file? by Limbic~Region (Chancellor) on Jun 04, 2007 at 12:37 UTC
Cazbo, If you are familiar with the unix command wc, it sounds like you are trying to re-implement that. If you are not doing this as a "learning" exercise, I would recommend you use an existing wheel (see PPT for instance). If you are just trying to do this yourself, here is something that should get you started: `#!/usr/bin/perl use strict; use warnings; my $file = $ARGV[0] or die "Usage: $0 <input_file>"; open(my $fh, '<', $file) or die "Unable to open '$file' for reading: $ +!"; my ($line_cnt, $word_cnt, $char_cnt); while (<$fh>) { chomp; $line_cnt++; $char_cnt += length($_); my @word = split " ", $_; $word_cnt += @word; }` [download] Cheers - L~R	[reply] [d/l]
Re: How do I just count any words from a file? by jettero (Monsignor) on Jun 04, 2007 at 12:38 UTC
The regular expression isn't an array you can `foreach` through. It returns something like a true false value while the iterator is true instead. So choose `while` instead of `foreach` here. See if something like this works better: `$wordcount ++ while $buffer =~ m/\b\w+\b/g` [download] -Paul	[reply] [d/l] [select]
Re^2: How do I just count any words from a file? by Cazbo (Initiate) on Jun 04, 2007 at 12:51 UTC
In reply to both of you, thank you very very very much :) Basically it's a project connected with my course. I had to create a simple .txt file containing the verse "Hey diddle diddle...", request input of and test the filename format, test the files existence and size, then count the characters, words, paragraphs etc in it, then print them to <STDOUT>. I've managed the characters and lines parts, but hit a wall of mental block when it came to the words etc. Anyway, that part now works, and I can move onto the paragraphs and and sentences to complete it. Although I've come to the end of learning Perl through my course, I'm a bit of a stubborn wench and refuse to let it go. I struggled with the implementation of it, despite getting high marks for the theory, and I'm not satisfied with my performance to date. Therefore, I'd like very much to stay with Perl Monks and use the exercises/examples here to further develop in this field. Respect bows Caz :)	[reply]
Re^3: How do I just count any words from a file? by jettero (Monsignor) on Jun 04, 2007 at 13:01 UTC
Spend as much time on learning regular expressions as is necessary to really understand and enjoy them. You won't regret it. All the modern high level languages (and even some low level ones) seem to support them now, so it's worth spending the effort on it no matter whether it's for a class or not. -Paul	[reply]
Re^2: How do I just count any words from a file? by bart (Canon) on Jun 04, 2007 at 14:37 UTC
The regular expression isn't an array you can foreach through It returns a list of matches. So actually, he can. It returns something like a true false value while the iterator is true instead. Not with /g in list context. Even when there are no capturing parens, because then you'll still get a list of matches for the entire pattern. So choose while instead of foreach here. See if something like this works better It'll use less memory, but otherwise, both should have pretty much the same end result.	[reply]
Re^2: How do I just count any words from a file? by varian (Chaplain) on Jun 04, 2007 at 14:45 UTC
The regular expression isn't an array you can foreach through. It returns something like a true false value while the iterator is true instead. Not true, the OP used the 'g' flag which makes the regexp return a list of matched groupings when in list context (and foreach expects a list). So to combine foreach with a regexp is just fine, in fact Cazbo you are close to the solution! Hint: to debug execute a `print "$wordcount, $_\n";` [download] within your foreach loop immediately after you do the autoincrement and see what it prints. Your regexp needs some finetuning and what about the counter itself?	[reply] [d/l]
Re^2: How do I just count any words from a file? by princepawn (Parson) on Jun 04, 2007 at 13:25 UTC
I suspected your `\b` was un-necessary and the following test file implies that it does... `use strict; my $line = 'hi there my name is bob'; sub m1 { my $line = shift; my $re = '(\w+)' ; my @match = ( $line =~ /$re/g ) ; return scalar @match ; } sub m2 { my $line = shift; my $re = '\b\w+\b' ; my $wordcount; $wordcount++ while $line =~ /$re/g ; return $wordcount ; } sub m3 { my $line = shift; my $re = '\w+' ; my $wordcount; $wordcount++ while $line =~ /$re/g ; return $wordcount ; } print m1($line); print m2($line); print m3($line);` [download] Carter's compass: I know I'm on the right track when by deleting something, I'm adding functionality	[reply] [d/l] [select]
Re: How do I just count any words from a file? by Moron (Curate) on Jun 04, 2007 at 13:39 UTC
You need a very clear description of what it should do. A word can mean a whole heap of different things - 2 bytes - delimited by whitespace and containing at least one alpha or numeric character ... etc. If it's the common or garden "delimited by cr or space", then I'd use split on a regexp to mean "1 or more of: ( space or carriage return)" (I'll follow the above trend of letting you read the docs rather than just giving the answer). You can find more about regexps in perlre. One other point though: when reading from a file with the <> operator, what is read is determined by scalar or list context (scalar read enough for a scalar, list read whole file into a list or array) together with whatever is in the $/ variable (the record delimiter). Thus to count the number of fields in STDIN, delimited by a regexp ... `use strict; use warnings; $/=undef(); # make it read everything in the file into a scalar contex +t my @words = split /the regexp/, <>; # now the size of the array given by $#words is one less than the word +count` [download] __________________________________________________________________________________ ^M Free your mind!	[reply] [d/l]
Re: How do I just count any words from a file? by Herkum (Parson) on Jun 04, 2007 at 13:38 UTC
You could always just break up each line by using blank spaces to separate it into words. If you wanted to do this use the split function. I won't go any further because, as you said, a project for you to learn. The best way to learn is to do, not copy! :)	[reply]
Re: How do I just count any words from a file? by FunkyMonk (Bishop) on Jun 04, 2007 at 16:01 UTC
I'd just like to add a rule I learnt from my early days of Perl that can be used to choose between split and a regexp: Use split when you know what you want to throw away, and a rexexp when you know what you want to keep. There are times when it doesn't apply, but it nearly always does (as it does in this case).	[reply]