monkfan has asked for the wisdom of the Perl Monks concerning the following question:

Hi,
Given the input of array and max string mismatch as params, I want to count its element into HoH format. I thought I get it right, but somehow it's not quite there. How can I modify the "read_trans_array_mismatch" subroutine in the code here:
#!/usr/bin/perl -w use strict; use Data::Dumper; my @to_proc = ( 'I1 TTAT', 'I1 TTTT', 'I1 TAGT', 'I2 TTAT', 'I3 TAGT', ); my $d = 1; my %transaction_map_mismatch; read_trans_array_mismatch(\%transaction_map_mismatch, \@to_proc,$d); print Dumper \%transaction_map_mismatch; #--------Sub------------ sub read_trans_array_mismatch { my $transaction_map_ref = shift; my $transaction_array = shift; my $d = shift; for ( my $i = 0; $i< @{$transaction_array}; $i++ ) { my @data = split(/\s/,$transaction_array->[$i]); my ($tid, $item) = @data; for ( my $j = 0 ; $j < @{$transaction_array} ;$j++ ) { my ($tid2, $item2) = split(/\s/,$transaction_array->[$j]); if ( hd($item,$item2) <= $d ) { $$transaction_map_ref{$item}{$tid}++; last; } } } } sub hd { #Hamming Distance of two strings #String length is assumed to be equal # Following djohntson advice, changed var declaration # from: my ($a, $b) my ($k,$l) = @_; my $len = length ($k); my $num_mismatch = 0; for (my $i=0; $i<$len; $i++) { ++$num_mismatch if substr($k, $i, 1) ne substr($l, $i, 1); } return $num_mismatch; }
Such that with $d = 1 it gives:
__END__ $VAR1 = { 'TAGT' => { 'I1' => 1, 'I3' => 1 }, 'TTTT' => { 'I1' => 2, #From TTAT and TTT in I1, #because their HD <= $d 'I2' => 1, }, 'TTAT' => { 'I2' => 1, 'I1' => 2 # also From TTAT and TTT in I1 } };
Currently with $d = 1 it gives wrongly this result:
# Which is the same as $d = 0 $VAR1 = { 'TAGT' => { 'I3' => 1, 'I1' => 1 }, 'TTTT' => { 'I1' => 1 }, 'TTAT' => { 'I2' => 1, 'I1' => 1 } };
Regards,
Edward

Replies are listed 'Best First'.
Re: Element Count from Array to HoH
by ivancho (Hermit) on May 26, 2005 at 06:29 UTC
    I think that  last; bit is messing your results.. Do you really want to break the moment you found a match? In this case, you'll find yourself for half of the array, and your neighbour for the other half - sounds confusing to me..

    Also, from your example it's not clear what you are counting - if we had a couple of rows like "I1 TTTT", and "I2 TTAT", currently you'd increase {TTTT}{I1}, when you see "I2 TTAT" - is that what you want? Perhaps you ought to be increasing {TTAT}{I1}, ie {$item2}{$tid}, or possibly {TTTT}{I2}?

    Last question, can you have repeated elements in your array, and if so, what should be the behaviour?

    Ok, now I'll suggest a rewrite, based on my assumptions what this should be doing, but don't be too disappointed if they were wrong. I prefer returning a new ref out of the sub, rather than passing one in:

    my $transaction_map_mismatch = get_trans_array_mismatch(\@to_proc,$d); print Dumper $transaction_map_mismatch; #--------Sub------------ sub get_trans_array_mismatch { my $transaction_array = shift; my $d = shift; my $trans_map_ref = {map {(split)[1]=>{}} @$transaction_array}; foreach (@{$transaction_array}) { my ($tid, $item) = split; foreach (grep {hd($item,$_) <= $d} keys %$trans_map_ref) { $trans_map_ref->{$_}{$tid}++; } } return $trans_map_ref; }
    Not efficient, in the slightest, but I think it does the job, or at least matches your desired output
      Hi Ivancho,
      Q: Perhaps you ought to be increasing {TTAT}{I1}, ie {$item2}{$tid}
      A: I can't do that, cause it will miss counting "TTTT". if what you mean is changing them to this:
      $$transaction_map_ref{$item2}{$tid}++;
      Q:or possibly {TTTT}{I2}?
      A:I don't get what you mean by that. What's the difference with the above statement.
      Sorry..I'm rather slow here..
      Q:can you have repeated elements in your array, and if so, what should be the behaviour?
      A: Yes if that's the case, as long as condition "hd<=$d", they should be counted.
      I hope that clarifies the problem.
      Regards,
      Edward
        Yeah, my bad, I only meant {TTTT}{I2} - which is what my code does, I shouldn't have asked about {TTAT}{I1}..

        My understanding is as follows. You get a pair "TTAT", "I2", and you say to yourself 'But this TTAT might actually be TTTT - so let's increase {TTTT}{I2} for good measure.. We'll increase {TTAT}{I2} too, when we get there...'

        If this is what you want to do, then my snippet should work fine.. The {map {(split)... line saves the unique 4 letter codes we have. Then for each line '$tid $item',
        grep {hd($item,$_) <= $d} keys %$trans_map_ref
        gives us which of our 4 letter codes $item might be. For each of them, we increase their {$tid} field.

        I think the problem here is that your function tries to do everything at once.. Maybe you can split it in 2 pieces - first find all pairs of neighbours between the 4 letter codes, then go through your array and for every item with a tid, also increase this tid for it's neighbours..

        hopefully this isn't just more mud...

Re: Element Count from Array to HoH
by djohnston (Monk) on May 26, 2005 at 07:07 UTC
    I believe part of the problem is that you only want to iterate through each distinct item once within that outer for loop. There are only three distinct items: TAGT, TTTT, and TTAT, yet the loop will iterate 5 times thus giving you incorrect results. Something like next if exists $transaction_map_ref->{$item}; inserted just before the second for loop ought to solve that problem.

    Secondly, $$transaction_map_ref{$item}{$tid}++; increments the transaction id of the outer loop item as opposed to the inner loop item, which is what we care about. That needs to be changed to $transaction_map_ref->{$item}{$tid2}++; instead.

    Lastly, ditch last.

    Also, I just happened to be reading about using the variable names $a and $b before I read this post. You ought to rename those variables within your hd routine, just for good measure. (see (re:x5 use strict....)$a and $b should be in perlvar)

Re: Element Count from Array to HoH
by tlm (Prior) on May 26, 2005 at 06:05 UTC

    Sorry, I can't divine the specifications for you program from your description of the desired output; in particular, you have given us no reason for why the values for $transaction_map_ref->{ TTAT }{ I1 } and $transaction_map_ref->{ TTAT }{ I2 } should ever differ, since the relationship between I1:TTAT and I2:TTAT, as far as your description goes, is symmetrical.

    the lowliest monk

      why the values for $transaction_map_ref->{ TTAT }{ I1 } and $transaction_map_ref->{ TTAT }{ I2 } should ever differ
      tlm,
      Thanks a lot for responding. As for your question above. Even though they have the same value, they should differ because it comes from different index (I1,I2).

      So if I'm correct in understanding your term 'symmetrical', in this sense they are not.
      I hope this clarify your question.

      Regards,
      Edward

        As for your question above. Even though they have the same value, they should differ because it comes from different index (I1,I2).

        I realize that, but given that in your description of the task indices are not mentioned, they can't be a basis for treating the two instances of TTAT differently.

        A cardinal skill in programming (or any form of engineering for that matter) is knowing how to specify the task you are trying to solve. Many bugs are the result of incorrect or incomplete specification. Examples are useful to quickly convey the gist of your goal, but they are no substitute for a full specification.

        How do you fully specify the task you are trying to accomplish? Apparently this is something that takes some practice, but to get in the right frame of mind you can begin by imagining that you are paying a lot of money to someone else for coding this thing, and therefore you have to make sure that you explain to him/her exactly what this thing needs to do, for otherwise you will be paying a lot of money with every round of revisions to the code's objectives.

        Knowing how to spec software is a fundamental programming skill. Don't rely on the monks doing it for you, or you will never progress as a programmer.

        the lowliest monk