in reply to 'Simple' comparing a hash with an array

Let's start by eliminating unnecessary steps and by not storing more values into memory than necessary. By getting rid of the fluff, you might be able to see the problem and solution more clearly.

#!/usr/bin/perl use strict; use warnings; my %histogram = map { $_ => 0 } qw( a am an and are as at be been but by can co de do due each ); # hash of the words to find so we can do an O(1) lookup for them while ( <> ) { chomp; for ( split ) { # split returns a list we can use directly $histogram{ $_ }++ if exists $histogram{ $_ }; # only store counts for words that matter } } foreach my $word ( keys %histogram ) { # keys() will list the keys, and we've already taken care # of making sure we don't have extra words stored. # Now there's no need to do two loops and check an array # against a hash. print "Found $word, $histogram{$word} times.\n"; }

Replies are listed 'Best First'.
Re^2: 'Simple' comparing a hash with an array
by Anonymous Monk on Apr 17, 2008 at 14:16 UTC
    Thank you monks. I've learnt a lot from this thread :-) I had originally used a hash after a perlfaq4 suggestion about comparing arrays, which would avoid some iteration. However, I ended up iterating over the hash anyway as I didn't know any other ways :-) As some have asked, sample input to this problem is:
    WE regret that a press of matter prevents our noticing
    I want to count frequencies of certain words, as listed in mr_mischief's %histogram. As for output something along the line of the following is what I'm trying to obtain:
    Found for, 1 times. Found such, 1 times. Found up, 1 times. Found at, 2 times. Found had, 1 times. Found was, 1 times. Cumulative total of all words found: 50
    To obtain this, and because I don't need to be concerned about the case of the input, I have added to mr_mischief's elegant code very slightly:
    #!/usr/bin/perl use strict; use warnings; my %histogram = map { $_ => 0 } qw( a am an and are as at be been but by can co de do due each ); # hash of the words to find so we can do an O(1) lookup for them while ( <> ) { chomp; for ( split ) { # split returns a list we can use directly tr/A-Z/a-z/; # lowercase all input print "$_\n"; $histogram{ $_ }++ if exists $histogram{ $_ }; # only store counts for words that matter } } my $count=0; foreach my $word ( keys %histogram ) { # keys() will list the keys, and we've already taken care # of making sure we don't have extra words stored. # Now there's no need to do two loops and check an array # against a hash. if ($histogram{$word} >0) { print "Found $word, $histogram{$word} times.\n"; $count = $count + $histogram{$word}; } } print "Cumulative total of all words found: $count\n";
    Thanks again
      If you're at all worried about locale and language issues, or if you're just concerned about doing things the canonical way, you can use $_ = lc $_; instead of tr/A-Z/a-z/; to get a lowercase version. lc and uc are built in, and they honor the current language and localization settings.
Re^2: 'Simple' comparing a hash with an array
by wade (Pilgrim) on Apr 17, 2008 at 16:23 UTC
    ++mr_mischief, nice design!
    --
    Wade