in reply to Hashes, keys and multiple histogram

Hi, now that you have fully explained what you want, how about this:
use strict; use warnings; use Data::Dumper; ########################################## my (%hist1, %hist2, %hist3); my @required_keys; while (<DATA>) { chomp; my @element = split; my $col0= shift @element; if ($col0 == 1){ $hist1{$_}++ for @element; } elsif ($col0 == 0){ $hist2{$_}++ for @element; } elsif ($col0 == 5){ $hist3{$_}++ for @element; } else { #do stuff here when all else fails, undef/NaNs print "WTF \n"; } }; print Dumper \%hist1; # using your __DATA__ section, not repeated here for brevity

which produces this for the %hist1 hash:

$VAR1 = { '2c16a946' => 1, '2' => 2, '361384ce' => 1, '287130e0' => 1, '1' => 2, '68fd1e64' => 2, 'e5ba7672' => 2, '0' => 6, '07c540c4' => 1, '1f89b562' => 2, '0b153874' => 1, 'be589b51' => 1, '4' => 1, 'd4bb7bd8' => 1, '241546e0' => 1, '38a947a1' => 1, '5' => 1, '38d50e09' => 1 };

I tried to keep the code above relatively close to what you had, but I would probably change the code to use only one hash of hashes, rather than three different hashes, leading to much shorter code:

use strict; use warnings; use Data::Dumper; my %hist; while (<DATA>) { chomp; my ($col0, @element) = split; $hist{$col0}{$_}++ for @element; }; print Dumper \%hist; # not repeating the __DATA__ section here
Which produces the following output.

$VAR1 = { '1' => { '2c16a946' => 1, '2' => 2, '361384ce' => 1, '287130e0' => 1, '1' => 2, '68fd1e64' => 2, 'e5ba7672' => 2, '0' => 6, '07c540c4' => 1, '1f89b562' => 2, '0b153874' => 1, 'be589b51' => 1, '4' => 1, 'd4bb7bd8' => 1, '241546e0' => 1, '38a947a1' => 1, '5' => 1, '38d50e09' => 1 }, '0' => { '8efede7f' => 1, 'ad4527a2' => 1, '7' => 2, 'b0660259' => 1, '3c9d8785' => 1, '2' => 7, '287e684f' => 1, '1' => 5, '18' => 1, 'e5ba7672' => 5, '07c540c4' => 1, 'f0cf0024' => 2, '0b153874' => 11, '776ce399' => 2, '80e26c9b' => 2, '64523cfa' => 1, '14' => 1, '7cd19acc' => 1, 'bc6e3dc1' => 1, '10' => 1, '31' => 1, '37e4aa92' => 1, '510b40a5' => 1, '9b5fd12f' => 1, '2c16a946' => 1, 'd833535f' => 1, 'ae46a29d' => 1, '68fd1e64' => 3, '0a519c5c' => 1, '0' => 16, '6' => 1, '1e88c74f' => 2, '1f89b562' => 1, '3' => 2, '8cf07265' => 2, '3486227d' => 1, '5a9ed9b0' => 1, '05db9164' => 5, '15' => 1, '8' => 1, '4' => 2, '439a44a4' => 1, 'd4bb7bd8' => 2, '5' => 1 }, '5' => { '0b153874' => 2, '0' => 5, '0468d672' => 1, '776ce399' => 2, '6c9c9cf3' => 1, '05db9164' => 2, '5' => 1 } };

Replies are listed 'Best First'.
Re^2: Hashes, keys and multiple histogram
by f77coder (Beadle) on Aug 17, 2014 at 15:34 UTC

    Many thanks Laurent for the code. The reason I'd like to keep the histograms separate is now I need to operate on the individual hash arrays. I need to find what is only in %hist1, only in %hist2, only in hist3% and then find intersections and probabilities on the intersection of %hist1,%hist2, %hist2,%hist3, and %hist1/%hist3

    Are there bindings to do statistical operations on the hash values?

      Well, I suspect that the modules with which you are going to analyze your data probably expect hash references (instead of hashes). If such is the case, then, instead of passing \%hist1, you can just pass to your function $hist{1}, which happens to contain a reference to the relevant sub-hash. For example, $hist{5} contains a hash ref pointing to the following data structure:
      0 HASH(0x6005200f0) 0 => 5 '0468d672' => 1 '05db9164' => 2 '0b153874' => 2 5 => 1 '6c9c9cf3' => 1 '776ce399' => 2
      If it turns out you need to pass an actual hash (and not a hash ref), then just dereference it by passing, for example, %{$hist{5}} to the function.

      Update: Maintaining 3 or 4 hashes containing essentially identical sets of data is usually a bad idea, because it scales up very badly when you need to add an additional data set, and the code is much longer (see the difference between my two sample programs) and it is therefore harder to maintain: if you need a change to be done, you need to do it in several different places and the chances are high that you'll forget one place.

      I can understand that using nested data structure may be challenging for a beginner, but you'll have to learn them anyway at one point (if you continue to do even relatively occasional programming), so why not start learning that right away? You know by now that, if you encounter difficulties, you'll easily get help from many monks here.