in reply to Hashes, keys and multiple histogram

f77coder: Please correct me if I'm wrong, but it seems that you have replaced the code of your original post with code derived, more or less, from a subsequent post by Laurent_R, and without citing any change to the OP. I had first composed a more snarky reply, but will confine myself to this: choroba and Laurent_R now look foolish for having posted (apparently) completely irrelevant replies to (what now appears as) your OP. If I read this thread aright, what you have done is akin to pulling the chair out from under someone as they are sitting down to dine! Please feel free to make whatever additions/updates/corrections/etc you feel are needed, but for the sake of courtesy and clarity, please leave the original material and cite your changes!

  • Comment on Re: Hashes, keys and multiple histogram

Replies are listed 'Best First'.
Re^2: Hashes, keys and multiple histogram
by Laurent_R (Canon) on Aug 17, 2014 at 19:01 UTC
    Yes, I confirm, the content of the OP has been significantly altered after choroba's answer and several of my answers. Especially, the three relevant (and most important) lines which, as of this posting, have this:
    %hist1 = map { $_ => 0 } @element;
    originally looked like this:
    $hist1{@element}++;
    The quoted output was also very different.

    That's not very fair to people who spent some of their free time trying to help you, f77coder. :-(

    Update: You are fairly new on this forum (13 writups), so I assume you did not realize that doing this kind of editing without stating it clearly is strongly discouraged around here. Because you are new, I'll consider these changes to your OP as just a small mistake, no big deal for me, I'll forget it.

    And BTW, your current code:

    %hist1 = map { $_ => 0 } @element;
    may look superficially closer than the original code to what you want to obtain, but you are still quite not there. What happens with this map syntax is that, each time you encounter the same individual element, you override your previous hash having the same key with the new one, so that, at the end, the best you get is a unique list of values (the keys of the hash), but no information about their frequency for each hash.

    Assuming I understood what you want, the right solution is very probably the for loop with incrementation that I offered.

      Apologies to everyone who tried to help. I was trying many iterations (beating my head against the wall) of the code and thought I put the latest up.

      Now I'm trying to understand Laurent's short code of an array of hashes versus individual hash elements

      my %hist; while (<DATA>) { chomp; my ($col0, @element) = split; $hist{$col0}{$_}++ for @element; };

      I'm looking to implement some simple set theory with statistics.

      To get keys that are unique to each set, i.e. subtract the intersection of other sets

      From here http://www.perlmonks.org/?node=How%20can%20I%20get%20the%20unique%20keys%20from%20two%20hashes%3F, it gives the following code

      my %seen = (); for my $element (keys(%hist1), keys(%hist2)) { $seen{$element}++; } my @uniq = keys %seen;

      which is why I thought it would be simpler to have separate hash arrays. There are elements in hist1 that are not in hist2 and vice versa. Is finding unique keys this way faster that subtracting the intersection from each set? A-(A int B)? At the moment I'm working with small sample data to debug but will be dealing with 12+Gb of data to process.

        If you have a hash of hashes (and not array of hashes) such as the one I showed in my second version of the program, you can use the code you showed (which finds the union, rather than the intersection, of two sets, i.e. a list of unique keys present in both sets) making the following small changes (I think it should be right, but I cannot test right now):
        my %seen = (); for my $element (keys(%{$hist{1}}), keys(%{$hist{2}})) { $seen{$element}++; } my @uniq = keys %seen;
        Having said that, we might have another serious problem here. 12 GB is a lot of data, it is far from being sure that such huge volumes of data will fit into your computer memory. In other words, you might not be able to store all your data into a hash. I am not talking of a Perl limitation, but of a limitation of your hardware.
Re^2: Hashes, keys and multiple histogram
by f77coder (Beadle) on Aug 18, 2014 at 01:21 UTC

    Sorry about that.