knirirr has asked for the wisdom of the Perl Monks concerning the following question:

In order to store various data generated as a script runs, I've ended up with a routine that looks like this (variable names changed for a shorter post) ;):
{ $a{$b}{$c}{$d}[$e]++; }
What this is supposed to be doing is counting the number of strings found in file $b that are of group $c and subgroup $d and length $e. Loading this structure up works nicely according to the output of Data::Dumper. The next problem is that I need to produce several summary files:
1. The number of items of group $d of length N (where N is 1 to a max. length determined elsewhere) found in $b.
2. The same as 1, but grouping by $c.
3. The number and length of $d summed for all the files.
4. The same as 3, but grouping by $c.
Looking at the dumped output I see:
file_b
{
  group_c_item => {
     'group_d_item' =>  
          [
          undef,
          undef,
          N
          ]                         
                  }
};
...where N is the number of items of length 3. Can anyone suggest a simple way of printing this output so I could get something like:
length|unique_d_1|unique_d_2|...etc
1     |     1    |      0   |
2     |     2    |      4   |
3     |     0    |      2   |
etc...
...for example? I've tried a few ugly loops, but with no success.

Replies are listed 'Best First'.
Re: Extracting data from nested hashes
by kvale (Monsignor) on Nov 17, 2004 at 17:30 UTC
    To get the derived statistics, loop over all the indices and count:
    my %table; # The number of items of group $d of length N (where N is 1 to a # max. length determined elsewhere) found in $b. for my $file (keys %a) { for my $group (keys %{$a{$file}}) { for my $subgroup (keys %{$a{$file}{$group}}) { for my $length (@{$a{$file}{$group}{$subgroup}}) { $table{$file}{$subgroup}[$length]++; } } } }
    To print out a table, use another nested loop:
    my $file = 'xxx'; for my $length (1..$max_length) { for my $unique_d (keys %{$a{$file}{$group}}) { # printing stuff using $table{$file}{$unique_d}[$length] } }
    Update: fixed a typo in the second comment.

    -Mark

      Looks interesting, thanks. I was trying something similar, but having dreadful trouble getting it to work properly - I suspect due to incorrect use of the keys %{$a{$file}{$group}}) syntax.
Re: Extracting data from nested hashes
by hmerrill (Friar) on Nov 17, 2004 at 17:16 UTC
    It's still a little unclear what you're looking for - a little more explanation would be helpful.

    Describe a particular row of output that you are hoping for, and explain what each column should contain.

      Apologies - I was trying to explain as clearly as possible, but have obviously failed! ;) Anyway, this goes back to a previous writeup for finding microsatellites in genome files. Having solved the problem of finding them I now need to count them and sort them into categories, viz.
      • Which genome file they were found in ($b).
      • How long the repeating motif is, e.g. 3 units ($c).
      • What the repeating motif is, e.g. ATT ($d).
      • How many repeating motifs there are, e.g.(ATT)6 ($e).
      There are various outputs I need, but the first would be a file for each genome showing each unique motif ($d) and how many of them were found for a particular length, e.g.
      units|A|AT|GT|ATT|...etc.
      11   |1|0 |2 |3  |...
      12   |0|1 |1 |4  |...
      
      This shows that I've found one case of an A that is 11 units long, no ATs of the same number of units, but two GTs of 11 units, etc. Does this make any sense?
        Are you all set now? Sounds like kvale's explanation of how to dereference your hashes and arrays in a loop was what you were looking for.