in reply to Counting frequency of strings in files

Others have already pointed you in the direction of the usual Perl way to solve the problem - use a hash, and have actually shown you some good coding habits along the way. However it's worth being a little more explicit about some of those habits. Consider:

use strict; use warnings; my $filename = 'delme.txt'; # Create a sample file open my $fOut, '>', $filename or die "Can't create $filename: $!\n"; print $fOut <<SAMPLE; the the the and me ok big dog me SAMPLE close $fOut; my %words = (' word ' => 'count'); open my $fIn, '<', $filename or die "Can't open $filename: $!\n"; while (<$fIn>) { $words{$_}++ for split; } close $fIn; printf "%-10s %3s\n", $_, $words{$_} for sort keys %words;

Prints:

word count and 1 big 1 dog 1 me 2 ok 1 the 3

For note the use strict: always use strictures (use strict; use warnings; - see The strictures, according to Seuss).

Then note the use of the three parameter version of open. In particular note the '<' to make the file open mode explicit. That makes the code clearer and safer. Also note the use of lexical file handled (declared using my), that also makes the code safer.

The split looks like absolute magic, but it is simply using defaults for all its parameters. Read teh split documentation until you understand ghow it works. Note that while (<$fIn>) sets the default variable ($_) and does a little other magic so you may want to read the while documentation too to understand what's going on there.

Most of the build in functions don't need () and I tend to skip them to reduce clutter, but that means you need to use the low priority or instead of || so the die does the right thing.

Notice that the die message gives both the file name and the system error (that's what the $! special variable is about) to make it easier to diagnose file errors.

The last line uses for as a statement modifier to compactly print out the contents of the words hash. Note that the header is generated by priming the words hash in a sneaky fashion: the spaces in the key guarantee there are no conflicts with words from the text and that the header line sorts first and thus gets printed first (that's just a trick, not a "coding habit" of course).

True laziness is hard work

Replies are listed 'Best First'.
Re^2: Counting frequency of strings in files
by Cian (Initiate) on Apr 25, 2012 at 14:49 UTC
    You people are awesome! Such good replies, I got it working thanks to you!