aquinom has asked for the wisdom of the Perl Monks concerning the following question:

Hey Monks, I'm trying to read in a file and create a hash for all the key/value pairs (summed, as in the code), then take in a second file that uses the same hash keys but divides them by the summed values. I'm not sure if the way to do this is to create 2 hashes or stick to 1, but let's assume my first input file looks like
0201 3201 1.00 0201 2608 1.00 0201 2402 0.94 0201 0302 1.00 2402 2402 0.99 0101 0201 0.99 0201 1101 1.00 0301 2601 1.00 0301 1101 0.98 2601 0301 1.00 0301 2601 1.00 0301 2601 1.00 0301 2601 1.00
and my second input file looks like
0201 3201 2.00 0201 2608 2.00 0201 2402 1.94 0201 0302 2.00 2402 2402 1.99 0101 0201 1.99 0201 1101 2.00 0301 2601 2.00 0301 1101 1.98 2601 0301 2.00 0301 2601 2.00 0301 2601 2.00 0301 2601 2.00
in the output I would expect the value for the row/column corresponding to A0301/A2601 to equal 0.5 (5/10) but in my code I can't even get the value to divide by the second input file values ( I just get 5 ) Not sure what to do, can anyone help me fix this?
#!/usr/bin/perl use strict; use warnings; my $infile = $ARGV[0]; my $infile2 = $ARGV[1]; unless (open(INFILE, $infile)){ die "Couldn't open infile: $!\n"; } my @AtypeData = qw(A0101 A0102 A0201 A0202 A0205 A0301 A0302 A1101 A23 +01 A2402 A2403 A2601 A2608 A2902 A3001 A3002 A3004 A3101 A3201 A3601 +A6801 A6802); my %diplotypes; my %diplotypes2; initHash(\%diplotypes, \@AtypeData); initHash(\%diplotypes2, \@AtypeData); ##read in the data while (<INFILE>){ chomp; my @line = split ('\t', $_); my $key1 = 'A' . $line[0] . '.' . 'A' . $line[1]; ##first key my $key2 = 'A' . $line[1] . '.' . 'A' . $line[0]; ##key the other +way ##check to see if the key exists in the hash ##if it doesn't there is data in your infile, not in you names ar +ray if (exists $diplotypes{$key1} && $line[0] <= $line[1]) { $diplotypes{$key1} += $line[2]; } elsif (exists $diplotypes{$key2} && $line[0] >= $line[1]) { $diplotypes{$key2} += $line[2]; } else{##world is out to get you print STDERR "No key for $key1 or $key2\n"; next; } } close INFILE; unless (open(INFILE2, $infile2)){ die "Couldn't open infile: $!\n"; } while (<INFILE2>){ chomp; my @line = split ('\t', $_); my $key1 = 'A' . $line[0] . '.' . 'A' . $line[1]; ##first key my $key2 = 'A' . $line[1] . '.' . 'A' . $line[0]; ##key the other +way ##check to see if the key exists in the hash ##if it doesn't there is data in your infile, not in you names ar +ray if (exists $diplotypes2{$key1} && $line[0] <= $line[1]) { $diplotypes2{$key1} += $line[2]; } elsif (exists $diplotypes2{$key2} && $line[0] >= $line[1]) { $diplotypes2{$key2} += $line[2]; } else{##world is out to get you print STDERR "No key for $key1 or $key2\n"; next; } } foreach my $key1(keys %diplotypes){ if (exists $diplotypes2{$key1}){ $diplotypes{$key1} /= $diplotypes2{$key1} +0.01; } } close INFILE2; printData(\%diplotypes, \@AtypeData); sub initHash { #init the all to all hash ##first argument is the hash of data, and the second is a referenc +e to all the columns my ($refHash, $refArr) = @_; foreach my $ele1(@$refArr){ foreach my $ele2(@$refArr){ my $key = $ele1 . "." . $ele2; if (exists $$refHash{$key}){ print STDERR "This key existed in your array of names, + skipping\n"; next; } else{ $$refHash{$key} = 0; } } } } sub printData { my ($refHash, $refArr) = @_; #print header line; print "MATRIX\t"; foreach my $ele(@$refArr){ print "$ele", "\t"; } print "\n"; #print out the actual data foreach my $ele1(@$refArr){ print "$ele1" , "\t";##print out the first value on the row, w +hich is the name foreach my $ele2(@$refArr){ my $key = $ele1 . "." . $ele2; if (exists $$refHash{$key}){ printf "%.2f \t", $$refHash{$key}; } else{ print STDERR "Something is wrong\n"; } } print "\n"; } }
  • Comment on How to divide the value of a hash key by the value of another hash key (when the keys are equivalent)?
  • Select or Download Code

Replies are listed 'Best First'.
Re: How to divide the value of a hash key by the value of another hash key (when the keys are equivalent)?
by zek152 (Pilgrim) on Jun 06, 2011 at 18:05 UTC

    One immediate problem is that you are only reading 1 file

    #your code #my $infile = $ARGV[0]; #my $infile2 = $ARGV[0]; <--- $infile == $infile2 #corrected code my $infile = $ARGV[0]; my $infile2 = $ARGV[1];

    Update: found another issue

    In the following code block you never initialize the value to 0 if the key does not exist.

    #yourcode while (<INFILE>){ chomp; my @line = split ('\t', $_); my $key1 = 'A' . $line[0] . '.' . 'A' . $line[1]; ##first key my $key2 = 'A' . $line[1] . '.' . 'A' . $line[0]; ##key the other +way ##check to see if the key exists in the hash ##if it doesn't there is data in your infile, not in you names ar +ray if (exists $diplotypes{$key1} && $line[0] <= $line[1]) { $diplotypes{$key1} += $line[2]; } elsif (exists $diplotypes{$key2} && $line[0] >= $line[1]) { $diplotypes{$key2} += $line[2]; } else{##world is out to get you print STDERR "No key for $key1 or $key2\n"; next; } }

    Something along the lines of the following might help your issue.

    while (<INFILE>){ chomp; my @line = split ('\t', $_); my $key1 = 'A' . $line[0] . '.' . 'A' . $line[1]; ##first key my $key2 = 'A' . $line[1] . '.' . 'A' . $line[0]; ##key the other +way ##check to see if the key exists in the hash ##if it doesn't there is data in your infile, not in you names arr +ay ##new logic if($line[0] <= $line[1]) { if(exists $diplotypes{$key1}) { $diplotypes{$key1} += $line[2]; } else { #key doesnt exist so add it $diplotypes{$key1} = $line[2]; } } else { if(exists $diplotypes{$key2}) { $diplotypes{$key2} += $line[2]; } else { #key doesnt exist so add it $diplotypes{$key2} = $line[2]; } } }}

    2nd Update: I did not notice that you had an initHashes function. That makes my 1st update unnecessary however what I posted is a more compact way of acheiving the same result. Sorry for the confusion.

    Hope this helps.

      I seem to be getting the expected output now, after fixing that silly typo in the $infile2 declaration
      The initHash subroutine initializes all possible keys and sets the values to 0 to begin with though, I understand your change but I don't see how it's necessary?
      sub initHash { #init the all to all hash ##first argument is the hash of data, and the second is a referenc +e to all the columns my ($refHash, $refArr) = @_; foreach my $ele1(@$refArr){ foreach my $ele2(@$refArr){ my $key = $ele1 . "." . $ele2; if (exists $$refHash{$key}){ print STDERR "This key existed in your array of names, + skipping\n"; next; } else{ $$refHash{$key} = 0; } } } }
Re: How to divide the value of a hash key by the value of another hash key (when the keys are equivalent)?
by toolic (Bishop) on Jun 06, 2011 at 18:06 UTC
    The code you posted does not compile for me. I get several of these errors:
    Global symbol "%diplotypes2" requires explicit package name
    Download your own code to make sure what you posted is what you are running.
      Hey, I think the previous poster saw what I failed to see.... but I'll repost the code anyways, it should run.