monkfan has asked for the wisdom of the Perl Monks concerning the following question:

Revered Monks,
This is a working code that generalizes the problem here. It attemps to count the support of a string in a hash, allowing mismatch. The code below already give correct answers. But the problem is that it alters the HoA (with the 'shift' function - see code below). How can I obtain the same result without having have to alter the HoA? I've tried adding label and use "next Label" variation to replace "shift" function, but it doesn't give the desired result.
#!/usr/bin/perl -w use strict; use Data::Dumper; # Explanation for this hash is for $d = 1 my $hash_mismatch = { # no match because hd($query[0],$hash_mismatch->{'S1'}[0]) > $d # even though hd($query[1],$hash_mismatch->{'S2'}[1]) < $d # i.e in this case only 1 element of query satisfy, # which doesn't count. To be valid, both of query elem must satisf +y "hd<=$d". 'S1' => [ 'GGAA', 'GGGG', 'TTTT' ], # match because hd($query[0],$hash_mismatch->{'S2'}[0]) = $d # and hd($query[1],$hash_mismatch->{'S2'}[2]) < $d 'S2' => [ 'GGGA', 'GTTT', 'GGGG' ], # match because hd($query[0],$hash_mismatch->{'S3'}[0]) = $d # and hd($query[1],$hash_mismatch->{'S3'}[1]) = $d 'S3' => [ 'GGGA', 'GCGG', 'GTTT' ], # match because hd($query[0],$hash_mismatch->{'S4'}[0]) < $d 'S4' => [ 'GGGG', 'AAAA', 'GGGG' ] }; my $d = 1; # string mismatch/hamming distance; # With $d = 1; the answer is : Support = 3 (from 'S2,S3,S4); # With $d = 0; the answer is : Support = 1 (from 'S4') my @query = ('GGGG','GGGG'); my $sup = count_support_mismatch($hash_mismatch,\@query,$d); print "Support = $sup\n"; #------------------Subs----------- sub count_support_mismatch { my ($hashref,$arref,$d) = @_; my $counter = 0; foreach my $key ( keys %{$hashref} ) { my $ar = $hashref->{$key}; my @match_list; foreach ( @{$ar} ) { foreach my $q ( @$arref ) { my $dist = hd($_,$q); if ( $dist <= $d ) { #print "$q - $_\n"; push @match_list, $_; } shift @{$ar}; # Part that alter hash } } if ( @match_list ) { $counter++; } } return $counter; } sub hd { # Compute hamming distance between two strings # String length is assumed to be equal my ($a,$b) = @_; my $len = length ($a); my $num_mismatch = 0; for (my $i=0; $i<$len; $i++) { ++$num_mismatch if substr($a, $i, 1) ne substr($b, $i, 1); } return $num_mismatch; }
Regards,
Edward

Replies are listed 'Best First'.
Re: How Not to Alter Hash in HoA while Counting Array
by BrowserUk (Patriarch) on May 24, 2005 at 05:33 UTC

    I don't think your code is doing what you think it is doing?

    I added a couple of print statements to a sllightly reworked version of your count_support_mismatch() routine:

    sub count_support_mismatch { my( $hashref, $arref, $d ) = @_; my $counter = 0; foreach my $key ( keys %{ $hashref } ) { my $ar = $hashref->{$key}; my @match_list; print join( '|', @{ $ar } ) . "\n" . join( '|', @$arref ) . "\ +n"; foreach my $elem ( @{ $ar } ) { foreach my $q ( @$arref ) { print "Hamming '$elem' <=> '$q'\n"; my $dist = hd( $elem, $q ); if( $dist <= $d ) { #print "$q - $_\n"; push @match_list, $elem; } shift @{ $ar }; # Part that alter hash } } $counter++ if @match_list; } return $counter; }

    And got this output:

    P:\test>459758 GGAA|GGGG|TTTT GGGG|GGGG Hamming 'GGAA' <=> 'GGGG' Hamming 'GGAA' <=> 'GGGG' GGGA|GTTT|GGGG GGGG|GGGG Hamming 'GGGA' <=> 'GGGG' Hamming 'GGGA' <=> 'GGGG' GGGG|AAAA|GGGG GGGG|GGGG Hamming 'GGGG' <=> 'GGGG' Hamming 'GGGG' <=> 'GGGG' GGGA|GCGG|GTTT GGGG|GGGG Hamming 'GGGA' <=> 'GGGG' Hamming 'GGGA' <=> 'GGGG' Support = 3

    which indicates that you are only comparing the first element of each hash value array against each of the elements from your query array. Is this your intent?

    If so, then there is no purpose in the middle of your 3 nested loops and the shifting. You need only hardcode the first element into the call to hd() and omit the middle loop to achieve the same goal and avoid modifying the arrays:

    sub count_support_mismatch { my( $hashref, $arref, $d ) = @_; my $counter = 0; foreach my $key ( keys %{ $hashref } ) { my $ar = $hashref->{$key}; my @match_list; print join( '|', @{ $ar } ) . "\n" . join( '|', @$arref ) . "\ +n"; foreach my $q ( @$arref ) { print "Hamming '$ar->[ 0 ]' <=> '$q'\n"; my $dist = hd( $ar->[ 0 ], $q ); if( $dist <= $d ) { #print "$q - $_\n"; push @match_list, $ar->[ 0 ]; } } $counter++ if @match_list; } return $counter; }

    which produces the same results:

    P:\test>459758 GGAA|GGGG|TTTT GGGG|GGGG Hamming 'GGAA' <=> 'GGGG' Hamming 'GGAA' <=> 'GGGG' GGGA|GTTT|GGGG GGGG|GGGG Hamming 'GGGA' <=> 'GGGG' Hamming 'GGGA' <=> 'GGGG' GGGG|AAAA|GGGG GGGG|GGGG Hamming 'GGGG' <=> 'GGGG' Hamming 'GGGG' <=> 'GGGG' GGGA|GCGG|GTTT GGGG|GGGG Hamming 'GGGA' <=> 'GGGG' Hamming 'GGGA' <=> 'GGGG' Support = 3

    However, if your intent is to compare each element of the hash element arrays against eachelement of the query array, then you need only omit the shift @{ $ar }; to produce this output:

    P:\test>459758 GGAA|GGGG|TTTT GGGG|GGGG Hamming 'GGAA' <=> 'GGGG' Hamming 'GGAA' <=> 'GGGG' Hamming 'GGGG' <=> 'GGGG' Hamming 'GGGG' <=> 'GGGG' Hamming 'TTTT' <=> 'GGGG' Hamming 'TTTT' <=> 'GGGG' GGGA|GTTT|GGGG GGGG|GGGG Hamming 'GGGA' <=> 'GGGG' Hamming 'GGGA' <=> 'GGGG' Hamming 'GTTT' <=> 'GGGG' Hamming 'GTTT' <=> 'GGGG' Hamming 'GGGG' <=> 'GGGG' Hamming 'GGGG' <=> 'GGGG' GGGG|AAAA|GGGG GGGG|GGGG Hamming 'GGGG' <=> 'GGGG' Hamming 'GGGG' <=> 'GGGG' Hamming 'AAAA' <=> 'GGGG' Hamming 'AAAA' <=> 'GGGG' Hamming 'GGGG' <=> 'GGGG' Hamming 'GGGG' <=> 'GGGG' GGGA|GCGG|GTTT GGGG|GGGG Hamming 'GGGA' <=> 'GGGG' Hamming 'GGGA' <=> 'GGGG' Hamming 'GCGG' <=> 'GGGG' Hamming 'GCGG' <=> 'GGGG' Hamming 'GTTT' <=> 'GGGG' Hamming 'GTTT' <=> 'GGGG' Support = 4

    Which avoids modifying the hash element arrays, but that gives a different results?

    Another possibility is that you want to compare the 2 elements of the query array against the first two elements (0 & 1) of the hash element arrays, then against (1 & 2) and then (2 & 3) (if 3 existed) etc.

    That's a little more complex to code and would require using indices into the hash element array and shifting the start position after the execution of the inner loop. That seems to be what you are trying to do, but if you are, you have misplaced the shift. It should be outside the inner loop not inside.

    That's my best guess. It's now over to you to indicate whether any of my guesses are correct :)


    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
    "Science is about questioning the status quo. Questioning authority".
    The "good enough" maybe good enough for the now, and perfection maybe unobtainable, but that should not preclude us from striving for perfection, when time, circumstance or desire allow.
Re: How Not to Alter Hash in HoA while Counting Array
by Forsaken (Friar) on May 24, 2005 at 05:28 UTC
    I've looked at that code, and then looked at it again, and then yet another time, and I can't even think of a reason why that shift is there in the first place, it doesn't even seem to do anything besides taking your precious HoA apart.


    Remember rule one...
Re: How Not to Alter Hash in HoA while Counting Array
by djohnston (Monk) on May 24, 2005 at 05:28 UTC
    Simply making a new reference to a copy of the array may solve your problem:

    my $ar = [ @{$hashref->{$key}} ];

Re: How Not to Alter Hash in HoA while Counting Array
by tlm (Prior) on May 24, 2005 at 13:35 UTC

    no match because hd($query[0],$hash_mismatch->{'S1'}[0]) > $d even though hd($query[1],$hash_mismatch->{'S2'}[1]) < $d i.e in this case only 1 element of query satisfy, which doesn't count. To be valid, both of query elem must satisfy "hd<=$d".

    I concur with BrowserUk that you need to straighten out your specs before you can get any sensible feedback on this question. The quoted passage makes no sense at all. The elements of @query are identical; i.e. one of them satisfies hd<=$d if and only if the other one does.

    the lowliest monk