AWallBuilder has asked for the wisdom of the Perl Monks concerning the following question:

Hi all, I am getting confused traversing 'multi-level' hashes, and cannot seem to find the exact answer I am looking for on similar posts.

First question, are these two notations to assign values to the 'deeper keys' equivalent? If the first commented one even correct?

#? $HoFlg1=(Query_id=>{$Query_id},Subj_id=>{$Subj_id},bit_score=> +{$bit_score},Flag=>$Flag1); #? $HoFlg2=(Query_id=>{$Query_id},Subj_id=>{$Subj_id},bit_score=> +{$bit_score},Flag=>$Flag2); $HoFlg1->{$Query_id}->{$Subj_id}->{$bit_score}=$Flag1; $HoFlg2->{$Query_id}->{$Subj_id}->{$bit_score}=$Flag2;

Second question, how do I find the maximum key value for the 3rd level key (ie. for all second level keys for the same key1?). That is for each $Query_id, I want to find the maximum $bit_score (for all $Subj_ids)

here are some incorrect/partial attempts

thanks in advance

##### For each Query_id find top bit score sub largest_key { my $hash = @_; my ($large_key) = each %$hash; foreach (my ($key) = keys %$hash) { if ($key > $large_key) { $large_key = $key; } } return $large_key; } ##### For each Query_id find top bit score my %Hotop_bit; foreach my $key1 (keys %{$HoFlg2}){ foreach my $key2 (keys %{$HoFlg2->{$key1}}){ foreach my $key3 (keys %{$HoFlg2->{$key1}->{$key2}}) { $top_bit=&largest_key(%{$HoFlg2->{$key1}->{$ke +y2}}) print "key1\t$key1\tkey2$key2\tbitscore\t$key3 +\n"; $Hotop_bit{$key1}=$top_bit; #??probably wrong } } print "for query\t$key1 top bit score is\t$Hotop_bit{$key1}\n" +; } ###For each Query_id find top bit score - using => hash creation ; foreach my $key1 (keys %{$HoFlg2}){ my @bits=map{$HoFlg2{$key1}->{bit_score}} keys(%HoFlg2); $top_bit=reverse sort(@bits)[0]; $Hotop_bit{$key1}=$top_bit; } ###For each Query_id find top bit score - using => hash creation ; foreach my $key1 (keys %{$HoFlg2}) my $highest_bit=( reverse sort { $HoFlg2{$a}->{bit_score} <=> +$HoFlg2{$b}->{bit_score}} keys(%HoFlg2) )[0];

Replies are listed 'Best First'.
Re: sort hash ignoring one hash level
by NetWallah (Canon) on May 14, 2012 at 15:40 UTC
    Let me try to itemize, and answer the issues:
    • The first 2 commented lines are incorrect because they attempt to create hashrefs that are based on lists that contain an odd number of elements
    • The next 2 assignments are correct, and equal in syntax
    • Your code assigning keys(%hash) to %large_key will simply assign the first available key. Why not assign the smallest number to it ? (Need some research to find out this value)
    • If keys are numeric (As indicated by your use of ">"), would an ARRAY be more appropriate ?
    • You could use the max function in List::Util to replace "sub largest_key".
    • Using "sort" to find "max" is overkill. Use max from List::Util.
    This list is incomplete. I'll try to add when I have more time, and hopefully, after you respond.

                 I hope life isn't a big joke, because I don't get it.
                       -SNL

      Thanks for the comments. I have significantly reworked the code, and most of it is working correctly. I can read in the blast results table, correctly assign "Flags" to each row of results. Find the maximum $bitscore for each query. But in the last section, I want to tabulate the number of different "Flags" for each Query, only for those results with $bit_scores within 90% of the top bit score. However, there is an error -> see below. Any help appreciated.

      ############### read in blast table and parse ####### my $in_blast_tab=$ARGV[0]; open(IN,$in_blast_tab) or die "cannot open $in_blast_tab\n"; my $HoFlg1={}; my $HoFlg2={}; #my $Flag1; my $Flag2; my %maxBits; my $max_bits=60; while (my $line=<IN>) { chomp $line; my ($Query_id,$strand,$Subj_id,$Perc_iden,$align_len,$num_mm,$ +gap,$q_start,$q_end,$s_start,$s_end,$e_value,$bit_score)=split("\t",$ +line); # extra f ield of strand next if ($bit_score <60); if ($bit_score > $max_bits){ $max_bits=$bit_score; } $maxBits{$Query_id}=$max_bits; my ($Flag1, $Flag2 ) = &Flag( $Subj_id, \%proph_prots, \%euk_p +rots, \%vir_prots ); $HoFlg1->{$Query_id}->{$Flag1}->{$bit_score}++; $HoFlg2->{$Query_id}->{$Flag2}->{$bit_score}++; print join("\t",$Query_id,$Subj_id,$bit_score,$Flag1,$Flag2)." +\n"; } ## Get a Flag for each query/subject sub Flag { my ( $Subj_id, $proph_prots, $euk_prots, $vir_prots ) = @_; return "Proph", "Phage" if exists $$proph_prots{$Subj_id}; return "Euk", "Euk" if exists $$euk_prots{$Subj_id}; return "Vir", "Phage" if exists $$vir_prots{$Subj_id}; return "Bact", "Bact"; } ## end sub Flag ## now for all query/flag pairs with bit scores that are within 90% of + the top_bit for that query ## my @flag_list2=("Phage","Euk","Bact"); my $count={}; foreach my $q (keys %{$HoFlg2}){ print "$q\t"; for my $flag (@flag_list2){ for my $b (keys %{$HoFlg2->{$flag}}){ if ($b > 0.9*$maxBits{$q}){ $count->{$flag} += $HoFlg2->{$q}->{$fl +ag}->{$b}; } } print "$count\t"; } print "\n"; }

      sampe input

      158256496-stool1_revised_C972998_1_gene3 strand:- 581103 +.GY4MC1_2020 37.93 116 61 2 1 110 1 + 111 1e- 158256496-stool1_revised_C972998_1_gene3 strand:- 539329 +.FTPG_01302 36.28 113 71 1 1 113 1 + 112 1e- 158256496-stool1_revised_C972998_1_gene3 strand:- 634956 +.Geoth_2108 37.93 116 61 2 1 110 1 + 111 1e- V1.UC9-0_revised_scaffold1508_3_gene71913 strand:+ 565641 +.EFOG_02198 57.55 212 55 3 1 212 967 + 1143 9e- V1.UC9-0_revised_scaffold1508_3_gene71913 strand:+ 565640 +.EFMG_02145 57.55 212 55 3 1 212 967 + 1143 9e- V1.UC9-0_revised_scaffold1508_3_gene71913 strand:+ 565639 +.EFYG_01514 57.55 212 55 3 1 212 967 + 1143 9e- V1.UC9-0_revised_scaffold1508_3_gene71913 strand:+ 565638 +.EFCG_01989 57.55 212 55 3 1 212 967 + 1143 9e-

      curent output

      158256496-stool1_revised_C972998_1_gene3 HASH(0x2b3c2bd3b000) HA +SH(0x2b3c2bd3b000) HASH(0x2b3c2bd3b000) V1.UC9-0_revised_scaffold1508_3_gene71913 HASH(0x2b3c2bd3b000) H +ASH(0x2b3c2bd3b000) HASH(0x2b3c2bd3b000)

      desired output

      #query num_Phage num_Euk num_Bact query_1 5 1 0 query_2 0 6 1
        Could it be the line: print "$count\t";

        Should it be print "$count->{$flag}\t";?

        Cristoforo (++) is right.

        A few more pieces of advice:

        • use strict;. If you did, it would have told you that "$count" was not valid.
        • Do not pre-pend your called subroutine names with &. There is old fashioned, and there are a few esoteric reasons for using it (Beyond the scope of this note).
        • Your "$max_bits" can be initialized to zero - that is probably more understable to the lay reader than having to explain why you start at 60.

                     I hope life isn't a big joke, because I don't get it.
                           -SNL

Re: sort hash ignoring one hash level
by johngg (Canon) on May 14, 2012 at 15:25 UTC

    I'm finding it difficult to understand what data structures you are trying to create and subsequently access. I think it would help us to help you if you could provide a small sample of the data you are working with and a description of what you are trying to do with it. Perhaps you could use Data::Dumper to confirm that the data structures you create are what you expected.

    Cheers,

    JohnGG