Counting frequency of strings in files

Cian has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Counting frequency of strings in files by toolic (Bishop) on Apr 23, 2012 at 17:17 UTC
You could accumulate words into a hash: `use warnings; use strict; use Data::Dumper; $Data::Dumper::Sortkeys = 1; my %words; while (<DATA>) { $words{$_}++ for split; } print Dumper(\%words); __DATA__ the the the and me ok big dog me` [download] Prints: `$VAR1 = { 'and' => 1, 'big' => 1, 'dog' => 1, 'me' => 2, 'ok' => 1, 'the' => 3 };` [download] See also: split perlintro Data::Dumper	[reply] [d/l] [select]
Re: Counting frequency of strings in files by GrandFather (Saint) on Apr 24, 2012 at 05:27 UTC
Others have already pointed you in the direction of the usual Perl way to solve the problem - use a hash, and have actually shown you some good coding habits along the way. However it's worth being a little more explicit about some of those habits. Consider: `use strict; use warnings; my $filename = 'delme.txt'; # Create a sample file open my $fOut, '>', $filename or die "Can't create $filename: $!\n"; print $fOut <<SAMPLE; the the the and me ok big dog me SAMPLE close $fOut; my %words = (' word ' => 'count'); open my $fIn, '<', $filename or die "Can't open $filename: $!\n"; while (<$fIn>) { $words{$_}++ for split; } close $fIn; printf "%-10s %3s\n", $_, $words{$_} for sort keys %words;` [download] Prints: `word count and 1 big 1 dog 1 me 2 ok 1 the 3` [download] For note the use strict: always use strictures (use strict; use warnings; - see The strictures, according to Seuss). Then note the use of the three parameter version of open. In particular note the '<' to make the file open mode explicit. That makes the code clearer and safer. Also note the use of lexical file handled (declared using my), that also makes the code safer. The split looks like absolute magic, but it is simply using defaults for all its parameters. Read teh split documentation until you understand ghow it works. Note that `while (<$fIn>)` sets the default variable ($_) and does a little other magic so you may want to read the while documentation too to understand what's going on there. Most of the build in functions don't need () and I tend to skip them to reduce clutter, but that means you need to use the low priority or instead of \|\| so the die does the right thing. Notice that the die message gives both the file name and the system error (that's what the $! special variable is about) to make it easier to diagnose file errors. The last line uses for as a statement modifier to compactly print out the contents of the words hash. Note that the header is generated by priming the words hash in a sneaky fashion: the spaces in the key guarantee there are no conflicts with words from the text and that the header line sorts first and thus gets printed first (that's just a trick, not a "coding habit" of course). True laziness is hard work	[reply] [d/l] [select]
Re^2: Counting frequency of strings in files by Cian (Initiate) on Apr 25, 2012 at 14:49 UTC
You people are awesome! Such good replies, I got it working thanks to you!	[reply]
Re: Counting frequency of strings in files by 2teez (Vicar) on Apr 23, 2012 at 22:06 UTC
Welcome to Perl and to programming!. To solve this problem Hash comes to mind. The code below open a new file to be read into i.e "freq_file.txt", and a file to read from. Print in the new file the values matched and it's frequency. See below: #!/usr/bin/perl use warnings; use strict; print "Enter the name of your file, ie myfile.txt: ",; chomp( my $file = <STDIN> ); my %found; open my $fh, '>', 'freq_file.txt' or die "can't open this file: $!"; open my $fh2, '<', $file or die "can't open this file: $!"; while ( my $line = <$fh2> ) { $found{$_}++ foreach split /\s+?/, $line; } print $fh "Frequence\tValue Found", $/; print $fh $_, "\t\t", $found{$_}, $/ foreach sort keys %found; close $fh2 or die "can't close file: $!"; close $fh or die "can't close file: $!"; [download] I hope this helps	[reply] [d/l]
Re: Counting frequency of strings in files by Anonymous Monk on Apr 23, 2012 at 17:28 UTC
`#!/usr/bin/perl -w use strict; print "Enter the name of your file, ie myfile.txt:\n"; chomp(my $val = <STDIN>); my %seen; open my $fh, '<', $val or die "wrong filename: $!"; while (defined(my $line = readline $fh)){ my @list = split "\t", $line; @seen{@list} = map{$seen{$_}\|\|0+1} @list ; } #while(my($string, $count) = each %seen){ foreach my $key(sort {$seen{$b} <=> $seen{$a}} keys %seen){ my $string = $key; my $count = $seen{$key}; print "Number of instances of '${string}' found: $count\n"; }` [download]	[reply] [d/l]
Re^2: Counting frequency of strings in files by Cian (Initiate) on Apr 23, 2012 at 18:03 UTC
your script is close alright, but it seems to give a count of 1 for everything, even though i know there is more than one of most... I also want to have the script create an output text file and put the results there.	[reply]
Re^3: Counting frequency of strings in files by Anonymous Monk on Apr 23, 2012 at 21:36 UTC
Sorry, there should be this: `@seen{@list} = map{($seen{$_}\|\|0)+1} @list;` and to print the output into a file, just say: `open my $output_fh, '>', "filename.txt" or die $!; #while(my($string, $count) = each %seen){ foreach my $string(sort {$seen{$b} <=> $seen{$a}} keys %seen){ my $count = $seen{$string}; printf $output_fh "%-70s%d\n", $string, $count; } close $output_fh;` [download]	[reply] [d/l] [select]