Re: Hashes, keys and multiple histogram

Apologies to everyone who tried to help. I was trying many iterations (beating my head against the wall) of the code and thought I put the latest up.

Now I'm trying to understand Laurent's short code of an array of hashes versus individual hash elements

my %hist;
while (<DATA>) {
  chomp;
  my ($col0, @element) = split;
  $hist{$col0}{$_}++ for @element;
};
[download]

I'm looking to implement some simple set theory with statistics.

To get keys that are unique to each set, i.e. subtract the intersection of other sets

From here http://www.perlmonks.org/?node=How%20can%20I%20get%20the%20unique%20keys%20from%20two%20hashes%3F, it gives the following code

my %seen = ();
for my $element (keys(%hist1), keys(%hist2)) {
    $seen{$element}++;
}
my @uniq = keys %seen;
[download]

which is why I thought it would be simpler to have separate hash arrays. There are elements in hist1 that are not in hist2 and vice versa. Is finding unique keys this way faster that subtracting the intersection from each set? A-(A int B)? At the moment I'm working with small sample data to debug but will be dealing with 12+Gb of data to process.

Comment on Re: Hashes, keys and multiple histogram Select or Download Code