Counting words

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Counting words by Limbic~Region (Chancellor) on Sep 13, 2004 at 14:10 UTC
Anonymous Monk, I think what you are asking is: How do I get a list of words, defined by white space, in a file and the number of times they appear. I realize that words like "book keeper" which can be spelled with or without a space, different case, and words that wrap lines are going to be an issue, but I want a 99% solution. `#!/usr/bin/perl use strict; use warnings; my $file = $ARGV[0] \|\| 'foo.txt'; open (INPUT, '<', $file) or die "Unable to open $file for reading : $! +"; my %word; while ( <INPUT> ) { chomp; $word{$_}++ for split " "; } print "$_ : $word{$_}\n" for sort { $word{$b} <=> $word{$a} } keys %wo +rd;` [download] It isn't perfect (99% solution), and the sort routine is not the most efficient, but you get the idea. Cheers - L~R	[reply] [d/l]
Re: Counting words by zejames (Hermit) on Sep 13, 2004 at 14:15 UTC
Use the split function. `my $max_words = 8; open F, "< d:\\temp\\perl\\test.txt" or die "Unable to open file\n"; undef $/; $data = <F>; @words = split /\W+/, $data; print join ':', @words[0..$max_words - 1];` [download] -- zejames	[reply] [d/l]
Re: Counting words by jbware (Chaplain) on Sep 13, 2004 at 14:40 UTC
Here is a regex way to grab the first x words for each line (in my example x=4). `use strict; open (IN,"<in.txt") or die $!; while (<IN>) { print "$1\n" if (/^\s*((?:[^\s]+\s+){4})/); } close(IN);` [download] -jbWare	[reply] [d/l]
Re^2: Counting words by zdog (Priest) on Sep 13, 2004 at 15:26 UTC
I modified your code a little to fit the problem description slightly better: `use strict; my $num_words = 4; my @words = (); open FILE, "<in.txt" or die $!; while ( <FILE> ) { last if ( push ( @words, m/\s([^\s]+)\s/g ) >= $num_words ); } close FILE; print join( " ", @words ) . "\n";` [download] As a side note, is there a reason to use `[^\s]` rather than `\S`, or is it just a matter of preference? Thanks. Zenon Zabinski \| zdog \| zdog@perlmonk.org	[reply] [d/l] [select]
Re^3: Counting words by jbware (Chaplain) on Sep 13, 2004 at 15:49 UTC
Yeah, "last if" is a good call, I wasn't thinking. The OP wasn't clear exactly how they wanted the results back (string or array of words), so I choose string. Nice convert to array though. As far as `[^\s]` versus `\S`, force of habit. If my laziness virtue would kick in like its supposed to, I'd have switched to \S by now and saved some keystrokes :) -jbWare	[reply] [d/l] [select]
Re: Counting words by tachyon (Chancellor) on Sep 14, 2004 at 02:31 UTC
As you have a stream you probably want to use stream parsing logic. Here is an example: `my ( $space, $wc ); my $get_words = 5; while( read(DATA,$_,1) ) { if ( m/\s/ ) { $space = 1; } else { if ( $space ) { print "\n"; $space = 0; $wc++; last if $wc >= $get_words; } print; } } __DATA__ There was an old lady who lived in a shoe` [download] cheers tachyon	[reply] [d/l]