jjw92 has asked for the wisdom of the Perl Monks concerning the following question:

I have code that contains an array of hashes. There are several entries for each "main" value, with incomplete hashes following. Below is an example that is similar to my circumstances:

my @data = (); $data[0] = ({ "name" => "joe", "age" => 21, "weight" => 150, "height" => "", "sex" => "" }); $data[1] = ({ "name" => "joe", "age" => "", "weight" => "", "height" => "6'0", "sex" => "" }); $data[2] = ({ "name" => "joe", "age" => "21", "weight" => "", "height" => "", "sex" => "male" });

"name" is the "main" value. My goal is to combine these into one hash that is complete and then move on to the next instance of this same situation. Any input would be greatly appreciated. Thanks

Replies are listed 'Best First'.
Re: Comparing/Completing Hashes
by BrowserUk (Patriarch) on Aug 14, 2010 at 00:16 UTC

    Geez! What a lot of non-answers for a simple question.

    #! perl -slw use strict; use Data::Dump qw[ pp ]; my @data = ( {"name"=>"joe","age"=>21,"weight"=>150,"height"=>"","sex"=>""}, {"name"=>"joe","age"=>"","weight"=>"","height"=>"6'0","sex"=>""}, {"name"=>"joe","age"=>"21","weight"=>"","height"=>"","sex"=>"male" +}, ); my %full; for my $p ( @data ) { $full{ $_ } ||= $p->{ $_ } for keys %$p; } pp \%full; __END__ c:\test>junk17 { age => 21, height => "6'0", name => "joe", sex => "male", weight => +150 }

    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
Re: Comparing/Completing Hashes
by roboticus (Chancellor) on Aug 14, 2010 at 00:24 UTC

    jjw92:

    Just a minor quibble: I wouldn't use empty strings to signify missing values. I use the default undef value for that (and in a hash, I'll often not insert the key for a missing value, either). In the general case, you may find that an empty string is a legitimate value, and then you wouldn't be able to tell the difference between a missing value and an intentional empty string.

    In the case you've presented, it appears that it wouldn't matter, as an empty string wouldn't normally be a valid value for age, height, weight or sex. But if you had a "nickname" field, for instance, a person with no nickname might be represented with an empty string. So if one of your data sources doesn't provide a nickname, stuffing the nickname with empty strings would imply to your database that those people have no nickname.

    Why would it matter? Think of three questions: (1) How many people in your database have nicknames? (2) How many people don't have nicknames? (3) How many people have we asked?

    Your method would allow you to answer question 1--mostly. But you couldn't legitimately answer questions 2 or 3. Using undef vs. and empty string would let you answer questions 2 and 3, as well as give you a better answer for question 1 (Out of X people that we've asked, Y have a nickname).

    Another way it could matter: In the problem you're solving now, you're merging information from multiple records. If one record is missing a data item, you can overlay the missing field from the other record. But what do you do if there's a conflict between two data items? You'll need to figure out how to resolve those differences. Throw in the fact that a nickname may be explicitly blank, and your code wouldn't be able to tell that the person no longer uses a nickname. Instead, they'll be stuck with their old nickname since you wouldn't be able to overlay it with an empty string--it would be treated as a missing value and ignored...

    Sorry for the long, rambling node. I haven't the time to make it short and concise.

    ...roboticus

      I see the merit in this, but it isn't really applicable in this case. I just made these hashes/keys/values off of what I could think of on the spot. What I am actually dealing with is an array of hashes returned by slurping a .csv file. This leads me to the statement that every field with no data will be a void, "", and every piece of data will correspond to the correct header. I like the thought though, and it certainly would apply in many other cases. Thanks for the response.

Re: Comparing/Completing Hashes
by choroba (Cardinal) on Aug 13, 2010 at 23:26 UTC
    Almost a one-liner:
    my %h; @h{keys %$_} = values %$_ for @data;
    But, I'd rather not use it, because it does not check for a missing name, several different values for a value etc. But I will keep the sophisticated version secret to let you devise it yourself.
      [I]t does not check for ... several different values for a [key] ...

      ... or for identical values for the same key in different anonymous hashes when those values are the empty string, as most values are in the example data.

        Covered by "etc" ;)

      When trying that, it only works for storing the last hash. If I knew more about what it was doing, maybe it is just a formatting issue? I'm not sure..

        As AnomalousMonk noted, you would have to avoid setting the empty values. But as I noted, you should rather not use this approach.
Re: Comparing/Completing Hashes
by Anonymous Monk on Aug 13, 2010 at 23:10 UTC
    Any input would be greatly appreciated. Thanks

    Great, you have a goal, now simply lay out the steps to accomplish that goal, translate that to code, and you're finished :) How (Not) To Ask A Question.

Re: Comparing/Completing Hashes
by AnomalousMonk (Archbishop) on Aug 13, 2010 at 23:21 UTC

    Sorry for all the backslashed double-quotes:

    >perl -wMstrict -le "my @data = ( { \"name\" => \"joe\", \"age\" => 21, \"weight\" => 150, \"height\" => \"\", \"sex\" => \"\", }, { \"name\" => \"joe\", \"age\" => \"\", \"weight\" => \"\", \"height\" => \"6'0\", \"sex\" => \"\", }, { \"name\" => \"joe\", \"age\" => \"21\", \"weight\" => \"\", \"height\" => \"\", \"sex\" => \"male\", }, ); my %hash = map { my $hr = $_; map { $_, $hr->{$_} } grep $hr->{$_} ne '', keys %$hr } @data ; use Data::Dumper; print Dumper \%hash; " $VAR1 = { 'name' => 'joe', 'weight' => 150, 'sex' => 'male', 'height' => '6\'0', 'age' => '21' };

      I am still in the process of learning Perl. Could you please explain to me what is actually happening with the map{} process?