Beefy Boxes and Bandwidth Generously Provided by pair Networks
Clear questions and runnable code
get the best and fastest answer

hash comparison

by sarvan (Sexton)
on Jul 25, 2011 at 05:52 UTC ( #916487=perlquestion: print w/replies, xml ) Need Help??

sarvan has asked for the wisdom of the Perl Monks concerning the following question:

Hi there, I have the following script.
use strict; use Data::Dumper; my $candidate = 'the the the the the the the'; my @candidate_words = split (/\W/, $candidate); my $candidate_count=@candidate_words; my %candidate = (); map { $candidate{$_}++ } @candidate_words; my $reference = 'the cat is the on the mat'; my @reference_words = split (/\W/, $reference); my %reference = (); map { $reference{$_}++ } @reference_words; while((my $key, my $val)=each(%candidate)){ print $key."->".$val."\n"; } print "-------------------------------------\n"; while((my $key, my $val)=each(%reference)){ print $key."->".$val."\n"; }

This scripts i m writing to find similarity between two sentences. The two sentences are stored in $candidate and $reference variables.

In the current script i made the program to count the occurence of each type of word and stored them in a hash.. Now,each hash has the words and its count from both candidate and reference sentences.

The help in need is, i want to take each words in the candidate and compare that with the two hashes to find the maximum reference count. for eg. if i have the word called "the" in candidate,i want to find the count of this word in %candidate hash as well as %reference hash and i want to take the minimum values(i.e if 2 and 5 is the count of "the" in two hashes i want 2.) out of this two counts. likewise for all words in the candidate.. Plz help me in this.. thanks

Replies are listed 'Best First'.
Re: hash comparison
by GrandFather (Saint) on Jul 25, 2011 at 08:33 UTC

    First off, a couple of style issues:

    • If you find yourself writing the same code over and again put it in a sub
    • Don't use map in place of for used as a statement modifier

    With that somewhat in mind consider the following:

    use strict; use warnings; use List::Util; my %candidate = CountWords ('the the the the the the the'); my %reference = CountWords ('the cat is the on the mat'); my %counts = map {$_ => List::Util::min ($candidate{$_} || 0, $reference{$_} || + 0)} keys %candidate, keys %reference; print "$_: $counts{$_}\n" for sort keys %counts; sub CountWords { my ($sentence) = @_; my @words = split (/\W/, $sentence); my %wordCount; ++$wordCount{$_} for @words; return %wordCount; }


    cat: 0 is: 0 mat: 0 on: 0 the: 3
    True laziness is hard work
Re: hash comparison
by choroba (Cardinal) on Jul 25, 2011 at 07:12 UTC
    If you want minimum, just replace the last while loop with this:
    for my $key (keys %candidate) { print "$key -> "; my $cand = $candidate{$key}; my $ref = $reference{$key}; if ($ref and $ref < $cand) { print $ref; } else { print $cand; } print "\n"; }
    For maximum, invert the < sign.
      Hi, Thanks for the reply

      And here is the doubt. suppose a word in the candidate that is not at all present in the reference, in such case it gives me the count as 1.(since 1 time it appeared in $cand). But, i expect it to be zero. because it dint appear in $ref..

      An Example sentences: $candidate="it is not probable that it is the end"; $reference="it is unlikely that it is the end";

        It is trivial to change the code I provided to give the expected result (in fact, you did not specify what to do if the term does not occur in the $reference, so I assumed you wanted to see the value from $candidate). Just remove 9 characters and add 5 somewhere :) (YMMW)
Re: hash comparison
by FunkyMonk (Chancellor) on Jul 25, 2011 at 12:02 UTC
    You would be better off splitting on /\W+/, rather than just /\W/


    say scalar split /\W/, "the cat is the on the mat"; # 7 say scalar split /\W/, "the cat is the on the mat"; # 13 say scalar split /\W+/, "the cat is the on the mat"; # 7 say scalar split /\W+/, "the cat is the on the mat"; # 7
Re: hash comparison
by Marshall (Canon) on Jul 25, 2011 at 08:15 UTC
    Perhaps this helps...Adjust the printout as you want...
    I prefer "foreach" over "map" when there is no left hand value for the map.

    #!/usr/bin/perl -w use strict; my $candidate = "the the the the the the the"; my $reference = "the cat is the on the mat"; my %cand_histogram; my %ref_histogram; $cand_histogram{$_}++ foreach (split(/\s+/,$candidate)); $ref_histogram{$_}++ foreach (split(/\s+/,$reference)); my %seen; printf "%-6s %-10s %-10s\n", 'Key','Candidate','Reference'; foreach my $key ( sort { $seen{$b} <=> $seen{$a} #descending word cnt or $a cmp $b #alphabetic otherwise } grep {!$seen{$_}++} # each key just once, # but count 'em also!, # 2 means => in both hashes (keys %cand_histogram, keys %ref_histogram) ) { printf "%-6s %-10s %-10s\n", $key, $cand_histogram{$key}||='0', $ref_histogram{$key} ||='0'; } __END__ OUTPUT: Key Candidate Reference the 7 3 cat 0 1 is 0 1 mat 0 1 on 0 1
      Hi marshall, Thanks for the code..

      And one little modification i want to do on that.. Now it gives me all the words in both the candidate and reference and their counts.

      But the output i look for is, i want to know only minimum count of candidate words among both candidate and reference..

      for e.g if a word "the" appears 7 times in candidate and 2 times in reference. it should be able to get 2 as the min between two counts. like this for all the words in candidate alone..

      please give me an idea how to do this. I will try

        Hi sarvan,
        I think that if you study the code, you will find that you have all that you need. The last "foreach" loop is on the fancy side of things, but it just loops over all of the unique keys in a special sort order. $cand_histogram{$key}||='0' uses 0 as the value in the case that there is no value for $cand_histogram{$key}. The print statement prints the 3 things that you need in order to calculate what you want. Why don't you give some code a try? Post your effort back here after you study it a bit.

Log In?

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://916487]
Approved by davido
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others avoiding work at the Monastery: (4)
As of 2023-09-25 09:20 GMT
Find Nodes?
    Voting Booth?

    No recent polls found