Re: Hashes, keys and multiple histogram
by choroba (Cardinal) on Aug 17, 2014 at 07:20 UTC
|
$hist1{@element}++;
@element in scalar context returns the size of the array @element. You shift the array before, so the key is the number of original elements minus 1. Are you sure that's what you want to hash by?
| [reply] [d/l] |
|
|
Thanks for the reply. No, it's not. The first column is the data is a switch variable, I need to grab that value from the line and put the rest of the line into the histogram. Each array element should be a key except for the first.
I want to shove the rest of the array into the keys, loop through the next lines and counts as values
| [reply] |
|
|
$hash1{$_}++ for @elements;
Example under the debugger:
DB<1> @elements = qw/ 1 3 5 4 6/;
DB<2> $hash1{$_}++ for @elements;
DB<3> x \%hash1
0 HASH(0x600509af0)
1 => 1
3 => 1
4 => 1
5 => 1
6 => 1
| [reply] [d/l] [select] |
|
|
|
|
|
|
An array cannot be the key of a hash. Perl stringifies hash keys, so that hash keys are always strings. Even if you tried something like this:
$hash1{\@elements}++;
or
$hash1{[@elements]++;
it would not work, because your key would end up being a stringified array ref (and the array content would be lost).
So either you want to use the string that you've read to be the hash key
$hash1{$line}++;
but that does not seem to be very useful in this context, or you want to store an array reference into the value of the hash
$hash1{"some key"} = \@elements;
but then I am not sure what you would want your key to be.
I think you need to have (and provide us) a clearer idea of the data structure that you want to have at the end of your process.
Quite possibly you really need an array of arrays, rather than a hash of arrays. Quick demonstration under the Perl debugger:
DB<1> @elements = qw/ 1 3 5 4 6/;
DB<2> push @array, \@elements;
DB<3> @elements2 = qw/ 12 13 14 15/;
DB<4> push @array, \@elements2;
DB<5> x \@array
0 ARRAY(0x600509af0)
0 ARRAY(0x600500b38)
0 1
1 3
2 5
3 4
4 6
1 ARRAY(0x600500928)
0 12
1 13
2 14
3 15
Update: Perhaps I misunderstood your requirement. Please read my next answer on Aug 17, 2014 at 08:50 UTC (immediately below)
| [reply] [d/l] [select] |
Re: Hashes, keys and multiple histogram
by Laurent_R (Canon) on Aug 17, 2014 at 13:38 UTC
|
Hi, now that you have fully explained what you want, how about this:
use strict;
use warnings;
use Data::Dumper;
##########################################
my (%hist1, %hist2, %hist3);
my @required_keys;
while (<DATA>) {
chomp;
my @element = split;
my $col0= shift @element;
if ($col0 == 1){
$hist1{$_}++ for @element;
} elsif ($col0 == 0){
$hist2{$_}++ for @element;
} elsif ($col0 == 5){
$hist3{$_}++ for @element;
} else {
#do stuff here when all else fails, undef/NaNs
print "WTF \n";
}
};
print Dumper \%hist1;
# using your __DATA__ section, not repeated here for brevity
which produces this for the %hist1 hash:
I tried to keep the code above relatively close to what you had, but I would probably change the code to use only one hash of hashes, rather than three different hashes, leading to much shorter code:
use strict;
use warnings;
use Data::Dumper;
my %hist;
while (<DATA>) {
chomp;
my ($col0, @element) = split;
$hist{$col0}{$_}++ for @element;
};
print Dumper \%hist;
# not repeating the __DATA__ section here
Which produces the following output.
| [reply] [d/l] [select] |
|
|
Many thanks Laurent for the code. The reason I'd like to keep the histograms separate is now I need to operate on the individual hash arrays. I need to find what is only in %hist1, only in %hist2, only in hist3% and then find intersections and probabilities on the intersection of %hist1,%hist2, %hist2,%hist3, and %hist1/%hist3
Are there bindings to do statistical operations on the hash values?
| [reply] |
|
|
Well, I suspect that the modules with which you are going to analyze your data probably expect hash references (instead of hashes). If such is the case, then, instead of passing \%hist1, you can just pass to your function $hist{1}, which happens to contain a reference to the relevant sub-hash. For example, $hist{5} contains a hash ref pointing to the following data structure:
0 HASH(0x6005200f0)
0 => 5
'0468d672' => 1
'05db9164' => 2
'0b153874' => 2
5 => 1
'6c9c9cf3' => 1
'776ce399' => 2
If it turns out you need to pass an actual hash (and not a hash ref), then just dereference it by passing, for example, %{$hist{5}} to the function.
Update: Maintaining 3 or 4 hashes containing essentially identical sets of data is usually a bad idea, because it scales up very badly when you need to add an additional data set, and the code is much longer (see the difference between my two sample programs) and it is therefore harder to maintain: if you need a change to be done, you need to do it in several different places and the chances are high that you'll forget one place.
I can understand that using nested data structure may be challenging for a beginner, but you'll have to learn them anyway at one point (if you continue to do even relatively occasional programming), so why not start learning that right away? You know by now that, if you encounter difficulties, you'll easily get help from many monks here.
| [reply] [d/l] [select] |
|
|
| [reply] |
Re: Hashes, keys and multiple histogram
by AnomalousMonk (Archbishop) on Aug 17, 2014 at 17:47 UTC
|
f77coder: Please correct me if I'm wrong, but it seems that you have replaced the code of your original post with code derived, more or less, from a subsequent post by Laurent_R, and without citing any change to the OP. I had first composed a more snarky reply, but will confine myself to this: choroba and Laurent_R now look foolish for having posted (apparently) completely irrelevant replies to (what now appears as) your OP. If I read this thread aright, what you have done is akin to pulling the chair out from under someone as they are sitting down to dine! Please feel free to make whatever additions/updates/corrections/etc you feel are needed, but for the sake of courtesy and clarity, please leave the original material and cite your changes!
| [reply] |
|
|
Yes, I confirm, the content of the OP has been significantly altered after choroba's answer and several of my answers. Especially, the three relevant (and most important) lines which, as of this posting, have this:
%hist1 = map { $_ => 0 } @element;
originally looked like this:
$hist1{@element}++;
The quoted output was also very different.
That's not very fair to people who spent some of their free time trying to help you, f77coder. :-(
Update: You are fairly new on this forum (13 writups), so I assume you did not realize that doing this kind of editing without stating it clearly is strongly discouraged around here. Because you are new, I'll consider these changes to your OP as just a small mistake, no big deal for me, I'll forget it.
And BTW, your current code:
%hist1 = map { $_ => 0 } @element;
may look superficially closer than the original code to what you want to obtain, but you are still quite not there. What happens with this map syntax is that, each time you encounter the same individual element, you override your previous hash having the same key with the new one, so that, at the end, the best you get is a unique list of values (the keys of the hash), but no information about their frequency for each hash.
Assuming I understood what you want, the right solution is very probably the for loop with incrementation that I offered.
| [reply] [d/l] [select] |
|
|
Apologies to everyone who tried to help. I was trying many iterations (beating my head against the wall) of the code and thought I put the latest up.
Now I'm trying to understand Laurent's short code of an array of hashes versus individual hash elements
my %hist;
while (<DATA>) {
chomp;
my ($col0, @element) = split;
$hist{$col0}{$_}++ for @element;
};
I'm looking to implement some simple set theory with statistics.
To get keys that are unique to each set, i.e. subtract the intersection of other sets
From here http://www.perlmonks.org/?node=How%20can%20I%20get%20the%20unique%20keys%20from%20two%20hashes%3F, it gives the following code
my %seen = ();
for my $element (keys(%hist1), keys(%hist2)) {
$seen{$element}++;
}
my @uniq = keys %seen;
which is why I thought it would be simpler to have separate hash arrays. There are elements in hist1 that are not in hist2 and vice versa. Is finding unique keys this way faster that subtracting the intersection from each set? A-(A int B)? At the moment I'm working with small sample data to debug but will be dealing with 12+Gb of data to process. | [reply] [d/l] [select] |
|
|
|
|
|
|
| [reply] |
Re: Hashes, keys and multiple histogram
by f77coder (Beadle) on Aug 18, 2014 at 01:42 UTC
|
Apologies to everyone who tried to help. I was trying many iterations (beating my head against the wall) of the code and thought I put the latest up.
Now I'm trying to understand Laurent's short code of an array of hashes versus individual hash elements
my %hist;
while (<DATA>) {
chomp;
my ($col0, @element) = split;
$hist{$col0}{$_}++ for @element;
};
I'm looking to implement some simple set theory with statistics.
To get keys that are unique to each set, i.e. subtract the intersection of other sets
From here http://www.perlmonks.org/?node=How%20can%20I%20get%20the%20unique%20keys%20from%20two%20hashes%3F, it gives the following code
my %seen = ();
for my $element (keys(%hist1), keys(%hist2)) {
$seen{$element}++;
}
my @uniq = keys %seen;
which is why I thought it would be simpler to have separate hash arrays. There are elements in hist1 that are not in hist2 and vice versa. Is finding unique keys this way faster that subtracting the intersection from each set? A-(A int B)? At the moment I'm working with small sample data to debug but will be dealing with 12+Gb of data to process. | [reply] [d/l] [select] |
Re: Hashes, keys and multiple histogram
by Laurent_R (Canon) on Aug 18, 2014 at 14:18 UTC
|
| [reply] |