f77coder has asked for the wisdom of the Perl Monks concerning the following question:

Hi

I need a bit more help on dealing with hashes and keys, I've tried reading the FAQ.

From a previous questions, http://perlmonks.org/index.pl?node_id=1096042 some great help from Athanasius but now I need to modify the code for multiple histograms.

I've tried many combination of hashes to arrays, keys, push… I don't understand why the keys aren't being updated with the new values?

Any help greatly appreciated.

use 5.12.0; use strict; use warnings; use Data::Dump; use Data::Dumper; ########################################## system('clear'); # 1. Configuration my @required_keys; my %hist1 = map { $_ => 0 } @required_keys; my %hist2 = map { $_ => 0 } @required_keys; my %hist3 = map { $_ => 0 } @required_keys; while (my $line = <DATA>) { chomp($line); my @element = split (/ +/,$line); my $col0= shift @element; #use element[0] as switch and drop from hi +stogram if( $col0==1){ %hist1 = map { $_ => 0 } @element; }elsif( $col0==0){ %hist2 = map { $_ => 0 } @element; }elsif( $col0==5){ %hist3 = map { $_ => 0 } @element }else{ #do stuff here when all else fails, undef/NaNs print "WTF \n"; } }; dd \%hist1; # Verify hash contents dd \%hist2; # Verify hash contents dd \%hist3; # Verify hash contents __DATA__ 0 1 0 2 68fd1e64 80e26c9b 1f89b562 e5ba7672 0 2 1 2 68fd1e64 f0cf0024 0b153874 07c540c4 0 2 14 2 287e684f 0a519c5c 0b153874 8efede7f 0 0 0 0 68fd1e64 2c16a946 0b153874 1e88c74f 0 3 0 0 8cf07265 ae46a29d 0b153874 1e88c74f 5 0 0 0 05db9164 6c9c9cf3 0b153874 776ce399 0 0 0 1 439a44a4 ad4527a2 0b153874 776ce399 1 1 0 0 68fd1e64 2c16a946 1f89b562 e5ba7672 0 0 8 31 05db9164 d833535f 0b153874 e5ba7672 0 0 1 2 05db9164 510b40a5 0b153874 d4bb7bd8 5 0 0 5 05db9164 0468d672 0b153874 776ce399 0 0 6 7 05db9164 9b5fd12f 0b153874 d4bb7bd8 1 0 0 0 241546e0 38a947a1 1f89b562 e5ba7672 1 0 5 4 be589b51 287130e0 361384ce 07c540c4 0 0 4 4 5a9ed9b0 80e26c9b 0b153874 3486227d 0 0 18 1 05db9164 bc6e3dc1 64523cfa 776ce399 1 1 2 2 68fd1e64 38d50e09 0b153874 d4bb7bd8 0 0 0 5 8cf07265 7cd19acc 0b153874 e5ba7672 0 0 2 10 05db9164 f0cf0024 37e4aa92 e5ba7672 0 7 3 15 3c9d8785 b0660259 0b153874 e5ba7672

the output

{ "0b153874" => 0, "1" => 0, "2" => 0, "38d50e09" => 0, "68fd1e64" => 0, "d4bb7bd8" => 0, } { "0b153874" => 0, "15" => 0, "3" => 0, "3c9d8785" => 0, "7" => 0, "b0660259" => 0, "e5ba7672" => 0, } { "0" => 0, "0468d672" => 0, "05db9164" => 0, "0b153874" => 0, "5" => 0, "776ce399" => 0, }

Replies are listed 'Best First'.
Re: Hashes, keys and multiple histogram
by choroba (Cardinal) on Aug 17, 2014 at 07:20 UTC
    $hist1{@element}++;

    @element in scalar context returns the size of the array @element. You shift the array before, so the key is the number of original elements minus 1. Are you sure that's what you want to hash by?

    لսႽ† ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ

      Thanks for the reply. No, it's not. The first column is the data is a switch variable, I need to grab that value from the line and put the rest of the line into the histogram. Each array element should be a key except for the first.

      I want to shove the rest of the array into the keys, loop through the next lines and counts as values

        Each array element should be a key except for the first

        Maybe I had misread this line when I wrote my previous answer. Possibly you really want something like this:

        $hash1{$_}++ for @elements;
        Example under the debugger:
        DB<1> @elements = qw/ 1 3 5 4 6/; DB<2> $hash1{$_}++ for @elements; DB<3> x \%hash1 0 HASH(0x600509af0) 1 => 1 3 => 1 4 => 1 5 => 1 6 => 1

        An array cannot be the key of a hash. Perl stringifies hash keys, so that hash keys are always strings. Even if you tried something like this:
        $hash1{\@elements}++;
        or
        $hash1{[@elements]++;
        it would not work, because your key would end up being a stringified array ref (and the array content would be lost).

        So either you want to use the string that you've read to be the hash key

        $hash1{$line}++;
        but that does not seem to be very useful in this context, or you want to store an array reference into the value of the hash
        $hash1{"some key"} = \@elements;
        but then I am not sure what you would want your key to be.

        I think you need to have (and provide us) a clearer idea of the data structure that you want to have at the end of your process.

        Quite possibly you really need an array of arrays, rather than a hash of arrays. Quick demonstration under the Perl debugger:

        DB<1> @elements = qw/ 1 3 5 4 6/; DB<2> push @array, \@elements; DB<3> @elements2 = qw/ 12 13 14 15/; DB<4> push @array, \@elements2; DB<5> x \@array 0 ARRAY(0x600509af0) 0 ARRAY(0x600500b38) 0 1 1 3 2 5 3 4 4 6 1 ARRAY(0x600500928) 0 12 1 13 2 14 3 15
        Update: Perhaps I misunderstood your requirement. Please read my next answer on Aug 17, 2014 at 08:50 UTC (immediately below)

Re: Hashes, keys and multiple histogram
by Laurent_R (Canon) on Aug 17, 2014 at 13:38 UTC
    Hi, now that you have fully explained what you want, how about this:
    use strict; use warnings; use Data::Dumper; ########################################## my (%hist1, %hist2, %hist3); my @required_keys; while (<DATA>) { chomp; my @element = split; my $col0= shift @element; if ($col0 == 1){ $hist1{$_}++ for @element; } elsif ($col0 == 0){ $hist2{$_}++ for @element; } elsif ($col0 == 5){ $hist3{$_}++ for @element; } else { #do stuff here when all else fails, undef/NaNs print "WTF \n"; } }; print Dumper \%hist1; # using your __DATA__ section, not repeated here for brevity

    which produces this for the %hist1 hash:

    I tried to keep the code above relatively close to what you had, but I would probably change the code to use only one hash of hashes, rather than three different hashes, leading to much shorter code:

    use strict; use warnings; use Data::Dumper; my %hist; while (<DATA>) { chomp; my ($col0, @element) = split; $hist{$col0}{$_}++ for @element; }; print Dumper \%hist; # not repeating the __DATA__ section here
    Which produces the following output.

      Many thanks Laurent for the code. The reason I'd like to keep the histograms separate is now I need to operate on the individual hash arrays. I need to find what is only in %hist1, only in %hist2, only in hist3% and then find intersections and probabilities on the intersection of %hist1,%hist2, %hist2,%hist3, and %hist1/%hist3

      Are there bindings to do statistical operations on the hash values?

        Well, I suspect that the modules with which you are going to analyze your data probably expect hash references (instead of hashes). If such is the case, then, instead of passing \%hist1, you can just pass to your function $hist{1}, which happens to contain a reference to the relevant sub-hash. For example, $hist{5} contains a hash ref pointing to the following data structure:
        0 HASH(0x6005200f0) 0 => 5 '0468d672' => 1 '05db9164' => 2 '0b153874' => 2 5 => 1 '6c9c9cf3' => 1 '776ce399' => 2
        If it turns out you need to pass an actual hash (and not a hash ref), then just dereference it by passing, for example, %{$hist{5}} to the function.

        Update: Maintaining 3 or 4 hashes containing essentially identical sets of data is usually a bad idea, because it scales up very badly when you need to add an additional data set, and the code is much longer (see the difference between my two sample programs) and it is therefore harder to maintain: if you need a change to be done, you need to do it in several different places and the chances are high that you'll forget one place.

        I can understand that using nested data structure may be challenging for a beginner, but you'll have to learn them anyway at one point (if you continue to do even relatively occasional programming), so why not start learning that right away? You know by now that, if you encounter difficulties, you'll easily get help from many monks here.

Re: Hashes, keys and multiple histogram
by AnomalousMonk (Archbishop) on Aug 17, 2014 at 17:47 UTC

    f77coder: Please correct me if I'm wrong, but it seems that you have replaced the code of your original post with code derived, more or less, from a subsequent post by Laurent_R, and without citing any change to the OP. I had first composed a more snarky reply, but will confine myself to this: choroba and Laurent_R now look foolish for having posted (apparently) completely irrelevant replies to (what now appears as) your OP. If I read this thread aright, what you have done is akin to pulling the chair out from under someone as they are sitting down to dine! Please feel free to make whatever additions/updates/corrections/etc you feel are needed, but for the sake of courtesy and clarity, please leave the original material and cite your changes!

      Yes, I confirm, the content of the OP has been significantly altered after choroba's answer and several of my answers. Especially, the three relevant (and most important) lines which, as of this posting, have this:
      %hist1 = map { $_ => 0 } @element;
      originally looked like this:
      $hist1{@element}++;
      The quoted output was also very different.

      That's not very fair to people who spent some of their free time trying to help you, f77coder. :-(

      Update: You are fairly new on this forum (13 writups), so I assume you did not realize that doing this kind of editing without stating it clearly is strongly discouraged around here. Because you are new, I'll consider these changes to your OP as just a small mistake, no big deal for me, I'll forget it.

      And BTW, your current code:

      %hist1 = map { $_ => 0 } @element;
      may look superficially closer than the original code to what you want to obtain, but you are still quite not there. What happens with this map syntax is that, each time you encounter the same individual element, you override your previous hash having the same key with the new one, so that, at the end, the best you get is a unique list of values (the keys of the hash), but no information about their frequency for each hash.

      Assuming I understood what you want, the right solution is very probably the for loop with incrementation that I offered.

        Apologies to everyone who tried to help. I was trying many iterations (beating my head against the wall) of the code and thought I put the latest up.

        Now I'm trying to understand Laurent's short code of an array of hashes versus individual hash elements

        my %hist; while (<DATA>) { chomp; my ($col0, @element) = split; $hist{$col0}{$_}++ for @element; };

        I'm looking to implement some simple set theory with statistics.

        To get keys that are unique to each set, i.e. subtract the intersection of other sets

        From here http://www.perlmonks.org/?node=How%20can%20I%20get%20the%20unique%20keys%20from%20two%20hashes%3F, it gives the following code

        my %seen = (); for my $element (keys(%hist1), keys(%hist2)) { $seen{$element}++; } my @uniq = keys %seen;

        which is why I thought it would be simpler to have separate hash arrays. There are elements in hist1 that are not in hist2 and vice versa. Is finding unique keys this way faster that subtracting the intersection from each set? A-(A int B)? At the moment I'm working with small sample data to debug but will be dealing with 12+Gb of data to process.

      Sorry about that.

Re: Hashes, keys and multiple histogram
by f77coder (Beadle) on Aug 18, 2014 at 01:42 UTC

    Apologies to everyone who tried to help. I was trying many iterations (beating my head against the wall) of the code and thought I put the latest up.

    Now I'm trying to understand Laurent's short code of an array of hashes versus individual hash elements

    my %hist; while (<DATA>) { chomp; my ($col0, @element) = split; $hist{$col0}{$_}++ for @element; };

    I'm looking to implement some simple set theory with statistics.

    To get keys that are unique to each set, i.e. subtract the intersection of other sets

    From here http://www.perlmonks.org/?node=How%20can%20I%20get%20the%20unique%20keys%20from%20two%20hashes%3F, it gives the following code

    my %seen = (); for my $element (keys(%hist1), keys(%hist2)) { $seen{$element}++; } my @uniq = keys %seen;

    which is why I thought it would be simpler to have separate hash arrays. There are elements in hist1 that are not in hist2 and vice versa. Is finding unique keys this way faster that subtracting the intersection from each set? A-(A int B)? At the moment I'm working with small sample data to debug but will be dealing with 12+Gb of data to process.

Re: Hashes, keys and multiple histogram
by Laurent_R (Canon) on Aug 18, 2014 at 14:18 UTC