Extracting data from nested hashes

knirirr has asked for the wisdom of the Perl Monks concerning the following question:

In order to store various data generated as a script runs, I've ended up with a routine that looks like this (variable names changed for a shorter post) ;):

{
  $a{$b}{$c}{$d}[$e]++;
}
[download]

What this is supposed to be doing is counting the number of strings found in file $b that are of group $c and subgroup $d and length $e. Loading this structure up works nicely according to the output of Data::Dumper. The next problem is that I need to produce several summary files:
1. The number of items of group $d of length N (where N is 1 to a max. length determined elsewhere) found in $b.
2. The same as 1, but grouping by $c.
3. The number and length of $d summed for all the files.
4. The same as 3, but grouping by $c.
Looking at the dumped output I see:

file_b
{
  group_c_item => {
     'group_d_item' =>  
          [
          undef,
          undef,
          N
          ]                         
                  }
};

...where N is the number of items of length 3. Can anyone suggest a simple way of printing this output so I could get something like:

length|unique_d_1|unique_d_2|...etc
1     |     1    |      0   |
2     |     2    |      4   |
3     |     0    |      2   |
etc...

...for example? I've tried a few ugly loops, but with no success.

Comment on Extracting data from nested hashes Download Code

Replies are listed 'Best First'.
Re: Extracting data from nested hashes by kvale (Monsignor) on Nov 17, 2004 at 17:30 UTC
To get the derived statistics, loop over all the indices and count: `my %table; # The number of items of group $d of length N (where N is 1 to a # max. length determined elsewhere) found in $b. for my $file (keys %a) { for my $group (keys %{$a{$file}}) { for my $subgroup (keys %{$a{$file}{$group}}) { for my $length (@{$a{$file}{$group}{$subgroup}}) { $table{$file}{$subgroup}[$length]++; } } } }` [download] To print out a table, use another nested loop: `my $file = 'xxx'; for my $length (1..$max_length) { for my $unique_d (keys %{$a{$file}{$group}}) { # printing stuff using $table{$file}{$unique_d}[$length] } }` [download] Update: fixed a typo in the second comment. -Mark	[reply] [d/l] [select]
Re^2: Extracting data from nested hashes by knirirr (Scribe) on Nov 17, 2004 at 17:38 UTC
Looks interesting, thanks. I was trying something similar, but having dreadful trouble getting it to work properly - I suspect due to incorrect use of the `keys %{$a{$file}{$group}})` syntax.	[reply] [d/l]
Re: Extracting data from nested hashes by hmerrill (Friar) on Nov 17, 2004 at 17:16 UTC
It's still a little unclear what you're looking for - a little more explanation would be helpful. Describe a particular row of output that you are hoping for, and explain what each column should contain.	[reply]
Re^2: Extracting data from nested hashes by knirirr (Scribe) on Nov 17, 2004 at 17:30 UTC
Apologies - I was trying to explain as clearly as possible, but have obviously failed! ;) Anyway, this goes back to a previous writeup for finding microsatellites in genome files. Having solved the problem of finding them I now need to count them and sort them into categories, viz. Which genome file they were found in ($b). How long the repeating motif is, e.g. 3 units ($c). What the repeating motif is, e.g. ATT ($d). How many repeating motifs there are, e.g.(ATT)6 ($e). There are various outputs I need, but the first would be a file for each genome showing each unique motif ($d) and how many of them were found for a particular length, e.g. units\|A\|AT\|GT\|ATT\|...etc. 11 \|1\|0 \|2 \|3 \|... 12 \|0\|1 \|1 \|4 \|... This shows that I've found one case of an A that is 11 units long, no ATs of the same number of units, but two GTs of 11 units, etc. Does this make any sense?	[reply]
Re^3: Extracting data from nested hashes by hmerrill (Friar) on Nov 18, 2004 at 13:11 UTC
Are you all set now? Sounds like kvale's explanation of how to dereference your hashes and arrays in a loop was what you were looking for.	[reply]
Re^4: Extracting data from nested hashes by knirirr (Scribe) on Nov 18, 2004 at 14:05 UTC