Hashes, keys and multiple histogram

f77coder has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Hashes, keys and multiple histogram by choroba (Cardinal) on Aug 17, 2014 at 07:20 UTC
`$hist1{@element}++;` [download] @element in scalar context returns the size of the array @element. You shift the array before, so the key is the number of original elements minus 1. Are you sure that's what you want to hash by? لսႽ† ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ	[reply] [d/l]
Re^2: Hashes, keys and multiple histogram by f77coder (Beadle) on Aug 17, 2014 at 07:29 UTC
Thanks for the reply. No, it's not. The first column is the data is a switch variable, I need to grab that value from the line and put the rest of the line into the histogram. Each array element should be a key except for the first. I want to shove the rest of the array into the keys, loop through the next lines and counts as values	[reply]
Re^3: Hashes, keys and multiple histogram by Laurent_R (Canon) on Aug 17, 2014 at 08:50 UTC
Each array element should be a key except for the first Maybe I had misread this line when I wrote my previous answer. Possibly you really want something like this: `$hash1{$_}++ for @elements;` [download] Example under the debugger: `DB<1> @elements = qw/ 1 3 5 4 6/; DB<2> $hash1{$_}++ for @elements; DB<3> x \%hash1 0 HASH(0x600509af0) 1 => 1 3 => 1 4 => 1 5 => 1 6 => 1` [download]	[reply] [d/l] [select]
Re^4: Hashes, keys and multiple histogram by f77coder (Beadle) on Aug 17, 2014 at 13:09 UTC
Re^5: Hashes, keys and multiple histogram by Laurent_R (Canon) on Aug 17, 2014 at 13:27 UTC
Re^3: Hashes, keys and multiple histogram by Laurent_R (Canon) on Aug 17, 2014 at 08:42 UTC
An array cannot be the key of a hash. Perl stringifies hash keys, so that hash keys are always strings. Even if you tried something like this: `$hash1{\@elements}++;` [download] or `$hash1{[@elements]++;` [download] it would not work, because your key would end up being a stringified array ref (and the array content would be lost). So either you want to use the string that you've read to be the hash key `$hash1{$line}++;` [download] but that does not seem to be very useful in this context, or you want to store an array reference into the value of the hash `$hash1{"some key"} = \@elements;` [download] but then I am not sure what you would want your key to be. I think you need to have (and provide us) a clearer idea of the data structure that you want to have at the end of your process. Quite possibly you really need an array of arrays, rather than a hash of arrays. Quick demonstration under the Perl debugger: `DB<1> @elements = qw/ 1 3 5 4 6/; DB<2> push @array, \@elements; DB<3> @elements2 = qw/ 12 13 14 15/; DB<4> push @array, \@elements2; DB<5> x \@array 0 ARRAY(0x600509af0) 0 ARRAY(0x600500b38) 0 1 1 3 2 5 3 4 4 6 1 ARRAY(0x600500928) 0 12 1 13 2 14 3 15` [download] Update: Perhaps I misunderstood your requirement. Please read my next answer on Aug 17, 2014 at 08:50 UTC (immediately below)	[reply] [d/l] [select]
Re: Hashes, keys and multiple histogram by Laurent_R (Canon) on Aug 17, 2014 at 13:38 UTC
Hi, now that you have fully explained what you want, how about this: use strict; use warnings; use Data::Dumper; ########################################## my (%hist1, %hist2, %hist3); my @required_keys; while (<DATA>) { chomp; my @element = split; my $col0= shift @element; if ($col0 == 1){ $hist1{$_}++ for @element; } elsif ($col0 == 0){ $hist2{$_}++ for @element; } elsif ($col0 == 5){ $hist3{$_}++ for @element; } else { #do stuff here when all else fails, undef/NaNs print "WTF \n"; } }; print Dumper \%hist1; # using your __DATA__ section, not repeated here for brevity [download] which produces this for the `%hist1` hash: Read more... (876 Bytes) I tried to keep the code above relatively close to what you had, but I would probably change the code to use only one hash of hashes, rather than three different hashes, leading to much shorter code: `use strict; use warnings; use Data::Dumper; my %hist; while (<DATA>) { chomp; my ($col0, @element) = split; $hist{$col0}{$_}++ for @element; }; print Dumper \%hist; # not repeating the __DATA__ section here` [download] Which produces the following output. Read more... (3 kB)	[reply] [d/l] [select]
Re^2: Hashes, keys and multiple histogram by f77coder (Beadle) on Aug 17, 2014 at 15:34 UTC
Many thanks Laurent for the code. The reason I'd like to keep the histograms separate is now I need to operate on the individual hash arrays. I need to find what is only in %hist1, only in %hist2, only in hist3% and then find intersections and probabilities on the intersection of %hist1,%hist2, %hist2,%hist3, and %hist1/%hist3 Are there bindings to do statistical operations on the hash values?	[reply]
Re^3: Hashes, keys and multiple histogram by Laurent_R (Canon) on Aug 17, 2014 at 16:46 UTC
Well, I suspect that the modules with which you are going to analyze your data probably expect hash references (instead of hashes). If such is the case, then, instead of passing `\%hist1`, you can just pass to your function `$hist{1}`, which happens to contain a reference to the relevant sub-hash. For example, `$hist{5}` contains a hash ref pointing to the following data structure: `0 HASH(0x6005200f0) 0 => 5 '0468d672' => 1 '05db9164' => 2 '0b153874' => 2 5 => 1 '6c9c9cf3' => 1 '776ce399' => 2` [download] If it turns out you need to pass an actual hash (and not a hash ref), then just dereference it by passing, for example, `%{$hist{5}}` to the function. Update: Maintaining 3 or 4 hashes containing essentially identical sets of data is usually a bad idea, because it scales up very badly when you need to add an additional data set, and the code is much longer (see the difference between my two sample programs) and it is therefore harder to maintain: if you need a change to be done, you need to do it in several different places and the chances are high that you'll forget one place. I can understand that using nested data structure may be challenging for a beginner, but you'll have to learn them anyway at one point (if you continue to do even relatively occasional programming), so why not start learning that right away? You know by now that, if you encounter difficulties, you'll easily get help from many monks here.	[reply] [d/l] [select]
Re^3: Hashes, keys and multiple histogram by AnomalousMonk (Archbishop) on Aug 18, 2014 at 14:52 UTC
For info on dealing with complex, multi-level Perl data structures, see the Perl Data Structures Cookbook (perldsc).	[reply]
Re: Hashes, keys and multiple histogram by AnomalousMonk (Archbishop) on Aug 17, 2014 at 17:47 UTC
f77coder: Please correct me if I'm wrong, but it seems that you have replaced the code of your original post with code derived, more or less, from a subsequent post by Laurent_R, and without citing any change to the OP. I had first composed a more snarky reply, but will confine myself to this: choroba and Laurent_R now look foolish for having posted (apparently) completely irrelevant replies to (what now appears as) your OP. If I read this thread aright, what you have done is akin to pulling the chair out from under someone as they are sitting down to dine! Please feel free to make whatever additions/updates/corrections/etc you feel are needed, but for the sake of courtesy and clarity, please leave the original material and cite your changes!	[reply]
Re^2: Hashes, keys and multiple histogram by Laurent_R (Canon) on Aug 17, 2014 at 19:01 UTC
Yes, I confirm, the content of the OP has been significantly altered after choroba's answer and several of my answers. Especially, the three relevant (and most important) lines which, as of this posting, have this: `%hist1 = map { $_ => 0 } @element;` [download] originally looked like this: `$hist1{@element}++;` [download] The quoted output was also very different. That's not very fair to people who spent some of their free time trying to help you, f77coder. `:-(` Update: You are fairly new on this forum (13 writups), so I assume you did not realize that doing this kind of editing without stating it clearly is strongly discouraged around here. Because you are new, I'll consider these changes to your OP as just a small mistake, no big deal for me, I'll forget it. And BTW, your current code: `%hist1 = map { $_ => 0 } @element;` [download] may look superficially closer than the original code to what you want to obtain, but you are still quite not there. What happens with this `map` syntax is that, each time you encounter the same individual element, you override your previous hash having the same key with the new one, so that, at the end, the best you get is a unique list of values (the keys of the hash), but no information about their frequency for each hash. Assuming I understood what you want, the right solution is very probably the for loop with incrementation that I offered.	[reply] [d/l] [select]
Re^3: Hashes, keys and multiple histogram by f77coder (Beadle) on Aug 18, 2014 at 01:35 UTC
Apologies to everyone who tried to help. I was trying many iterations (beating my head against the wall) of the code and thought I put the latest up. Now I'm trying to understand Laurent's short code of an array of hashes versus individual hash elements `my %hist; while (<DATA>) { chomp; my ($col0, @element) = split; $hist{$col0}{$_}++ for @element; };` [download] I'm looking to implement some simple set theory with statistics. To get keys that are unique to each set, i.e. subtract the intersection of other sets From here http://www.perlmonks.org/?node=How%20can%20I%20get%20the%20unique%20keys%20from%20two%20hashes%3F, it gives the following code `my %seen = (); for my $element (keys(%hist1), keys(%hist2)) { $seen{$element}++; } my @uniq = keys %seen;` [download] which is why I thought it would be simpler to have separate hash arrays. There are elements in hist1 that are not in hist2 and vice versa. Is finding unique keys this way faster that subtracting the intersection from each set? A-(A int B)? At the moment I'm working with small sample data to debug but will be dealing with 12+Gb of data to process.	[reply] [d/l] [select]
Re^4: Hashes, keys and multiple histogram by Laurent_R (Canon) on Aug 18, 2014 at 07:08 UTC
Re^5: Hashes, keys and multiple histogram by f77coder (Beadle) on Aug 19, 2014 at 15:49 UTC
Re^2: Hashes, keys and multiple histogram by f77coder (Beadle) on Aug 18, 2014 at 01:21 UTC
Sorry about that.	[reply]
Re: Hashes, keys and multiple histogram by f77coder (Beadle) on Aug 18, 2014 at 01:42 UTC
Apologies to everyone who tried to help. I was trying many iterations (beating my head against the wall) of the code and thought I put the latest up. Now I'm trying to understand Laurent's short code of an array of hashes versus individual hash elements `my %hist; while (<DATA>) { chomp; my ($col0, @element) = split; $hist{$col0}{$_}++ for @element; };` [download] I'm looking to implement some simple set theory with statistics. To get keys that are unique to each set, i.e. subtract the intersection of other sets From here http://www.perlmonks.org/?node=How%20can%20I%20get%20the%20unique%20keys%20from%20two%20hashes%3F, it gives the following code `my %seen = (); for my $element (keys(%hist1), keys(%hist2)) { $seen{$element}++; } my @uniq = keys %seen;` [download] which is why I thought it would be simpler to have separate hash arrays. There are elements in hist1 that are not in hist2 and vice versa. Is finding unique keys this way faster that subtracting the intersection from each set? A-(A int B)? At the moment I'm working with small sample data to debug but will be dealing with 12+Gb of data to process.	[reply] [d/l] [select]
Re: Hashes, keys and multiple histogram by Laurent_R (Canon) on Aug 18, 2014 at 14:18 UTC
Just in case you missed it, please see my answer to the same questions here: Re^4: Hashes, keys and multiple histogram	[reply]