Re: Hashes, keys and multiple histogram

Replies are listed 'Best First'.
Re^2: Hashes, keys and multiple histogram by Laurent_R (Canon) on Aug 17, 2014 at 19:01 UTC
Yes, I confirm, the content of the OP has been significantly altered after choroba's answer and several of my answers. Especially, the three relevant (and most important) lines which, as of this posting, have this: `%hist1 = map { $_ => 0 } @element;` [download] originally looked like this: `$hist1{@element}++;` [download] The quoted output was also very different. That's not very fair to people who spent some of their free time trying to help you, f77coder. `:-(` Update: You are fairly new on this forum (13 writups), so I assume you did not realize that doing this kind of editing without stating it clearly is strongly discouraged around here. Because you are new, I'll consider these changes to your OP as just a small mistake, no big deal for me, I'll forget it. And BTW, your current code: `%hist1 = map { $_ => 0 } @element;` [download] may look superficially closer than the original code to what you want to obtain, but you are still quite not there. What happens with this `map` syntax is that, each time you encounter the same individual element, you override your previous hash having the same key with the new one, so that, at the end, the best you get is a unique list of values (the keys of the hash), but no information about their frequency for each hash. Assuming I understood what you want, the right solution is very probably the for loop with incrementation that I offered.	[reply] [d/l] [select]
Re^3: Hashes, keys and multiple histogram by f77coder (Beadle) on Aug 18, 2014 at 01:35 UTC
Apologies to everyone who tried to help. I was trying many iterations (beating my head against the wall) of the code and thought I put the latest up. Now I'm trying to understand Laurent's short code of an array of hashes versus individual hash elements `my %hist; while (<DATA>) { chomp; my ($col0, @element) = split; $hist{$col0}{$_}++ for @element; };` [download] I'm looking to implement some simple set theory with statistics. To get keys that are unique to each set, i.e. subtract the intersection of other sets From here http://www.perlmonks.org/?node=How%20can%20I%20get%20the%20unique%20keys%20from%20two%20hashes%3F, it gives the following code `my %seen = (); for my $element (keys(%hist1), keys(%hist2)) { $seen{$element}++; } my @uniq = keys %seen;` [download] which is why I thought it would be simpler to have separate hash arrays. There are elements in hist1 that are not in hist2 and vice versa. Is finding unique keys this way faster that subtracting the intersection from each set? A-(A int B)? At the moment I'm working with small sample data to debug but will be dealing with 12+Gb of data to process.	[reply] [d/l] [select]
Re^4: Hashes, keys and multiple histogram by Laurent_R (Canon) on Aug 18, 2014 at 07:08 UTC
If you have a hash of hashes (and not array of hashes) such as the one I showed in my second version of the program, you can use the code you showed (which finds the union, rather than the intersection, of two sets, i.e. a list of unique keys present in both sets) making the following small changes (I think it should be right, but I cannot test right now): `my %seen = (); for my $element (keys(%{$hist{1}}), keys(%{$hist{2}})) { $seen{$element}++; } my @uniq = keys %seen;` [download] Having said that, we might have another serious problem here. 12 GB is a lot of data, it is far from being sure that such huge volumes of data will fit into your computer memory. In other words, you might not be able to store all your data into a hash. I am not talking of a Perl limitation, but of a limitation of your hardware.	[reply] [d/l]
Re^5: Hashes, keys and multiple histogram by f77coder (Beadle) on Aug 19, 2014 at 15:49 UTC
Re^2: Hashes, keys and multiple histogram by f77coder (Beadle) on Aug 18, 2014 at 01:21 UTC
Sorry about that.	[reply]