in reply to Hashes, keys and multiple histogram
Apologies to everyone who tried to help. I was trying many iterations (beating my head against the wall) of the code and thought I put the latest up.
Now I'm trying to understand Laurent's short code of an array of hashes versus individual hash elements
my %hist; while (<DATA>) { chomp; my ($col0, @element) = split; $hist{$col0}{$_}++ for @element; };
I'm looking to implement some simple set theory with statistics.
To get keys that are unique to each set, i.e. subtract the intersection of other sets
From here http://www.perlmonks.org/?node=How%20can%20I%20get%20the%20unique%20keys%20from%20two%20hashes%3F, it gives the following code
my %seen = (); for my $element (keys(%hist1), keys(%hist2)) { $seen{$element}++; } my @uniq = keys %seen;
which is why I thought it would be simpler to have separate hash arrays. There are elements in hist1 that are not in hist2 and vice versa. Is finding unique keys this way faster that subtracting the intersection from each set? A-(A int B)? At the moment I'm working with small sample data to debug but will be dealing with 12+Gb of data to process.
|
|---|