Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hi, I've searched the internet on how to handle duplicate keys in a hash, but am still confused. For instance, take a text file with numbers on the left, names on the right, with numbers as keys, names values.

1 AAA

2 BBB

3 CCC

4 DDD

2 EEE

From what I've read, you're supposed to use hash references into arrays? But I don't see how(in example) typing in the "2" would pull up both BBB AND EEE when the info is embedded within a reference within a hash, if this is the correct way to go about it. What is the simplest method to approach the duplicate key problem? Any clarification you might offer will be greatly appreciated. <\p>

Replies are listed 'Best First'.
Re: duplicate keys on hash
by Anonymous Monk on Feb 11, 2015 at 12:44 UTC
    you're supposed to use hash references into arrays?

    The data structure you're probably looking for is a hash, where each value is a reference to an individual array - a "hash of arrays". The links that Happy-the-monk provided above should explain.

    But I don't see how(in example) typing in the "2" would pull up both BBB AND EEE

    "Access and Printing of a HASH OF ARRAYS" in perldsc shows you some examples of how to access the data structure. It'd probably also be helpful to you if you learn about how to handle references in general, e.g. perlreftut.

    To get you started, here's one way to build such a structure from the input you showed. One important concept here is autovivification, which is what causes you to be able to write @{$data{$k}} and have a reference to a new array spring into existence ("vivify") even if $data{$k} was previously undefined.

    use warnings; use strict; my %data; while (<DATA>) { chomp; next unless /\S/; my ($k,$v) = /^\s*(\S+)\s+(.+?)\s*$/; push @{$data{$k}}, $v; } use Data::Dumper; print Dumper(\%data); __DATA__ 1 AAA 2 BBB 3 CCC 4 DDD 2 EEE

    Output:

    $VAR1 = { '3' => [ 'CCC' ], '1' => [ 'AAA' ], '4' => [ 'DDD' ], '2' => [ 'BBB', 'EEE' ] };

      While replying to above Anonymonk I am actually clarifying to OP

      But I don't see how(in example) typing in the "2" would pull up both BBB AND EEE

      The possibly missing information is the syntax used above to treat the hash value - spelled out $hash{$key} - as if it were a plain array is to put it inside this thingy @{ ... } instead of the dots.

      You have been shown that it gets into the hash value with the simple push function.

      Getting stuff out works the same way:

      foreach my $key ( keys %hash ) { foreach my $value ( @{ $hash{$key} } ) { print $key, ": ",$value, "\n" # or whatever you want to do. } }

      Cheers, Sören

      Créateur des bugs mobiles - let loose once, run everywhere.
      (hooked on the Perl Programming language)

Re: duplicate keys on hash
by Happy-the-monk (Canon) on Feb 11, 2015 at 11:45 UTC

    Have a look at the Tutorials in the main perldoc page.

    The first three deal with data structures and references - that's what you were looking for:

    • perlreftut Perl references short introduction
    • perldsc Perl data structures intro
    • perllol Perl data structures: arrays of arrays

    Cheers, Sören

    Créateur des bugs mobiles - let loose once, run everywhere.
    (hooked on the Perl Programming language)

Re: duplicate keys on hash
by tune (Curate) on Feb 11, 2015 at 11:42 UTC
    I would assign an arrayref to the hash key. Eg.
    use Data::Dumper; my %hash = ( 2 => ['BBB', 'EEE'], ); print Dumper \%hash;
    Is that what you are looking for?

    --
    tune

Re: duplicate keys on hash
by i5513 (Pilgrim) on Feb 11, 2015 at 15:08 UTC

    As alternative, and if your input get some complex, I would use a hash of hashes, with a "fixed field name" before every field. In the follow example I will use an extra final key, so every value could have some attributes to keep

    Of course if you need program a "program" and not a cheap script, then probably you should use object orientation (perlobj)

    The idea would be to have:

    #!/usr/bin/perl -w use strict; my %data; while (<DATA>) { chomp; my ($number, $value) = split; $data{number}{$number}{value}{$value}{metric1}||=0; $data{number}{$number}{value}{$value}{metric1}++; } foreach my $n (keys %{$data{number}}) { foreach my $v (keys %{$data{number}{$n}{value}}) { print $v." (".$data{number}{$n}{value}{$v}{metric1}." times)\n +"; } } __DATA__ 1 AAA 2 BBB 3 CCC 4 DDD 2 EEE 2 EEE
Re: duplicate keys on hash
by ggadd (Acolyte) on Feb 11, 2015 at 22:32 UTC

    Thank you all for the extensive info as well as web links. I've had a serious mental block with this stuff. When I first started studying programming last year, Perl became almost an addiction. I felt in tune with the logical flow of the language. Not so, anymore. The deeper I go, the more it seems like an extensive patchwork of tricks to do jobs which Larry didn't really didn't intend for it to do. Perhaps that's the reason for Perl 6? -to get rid of the tricks and start over with a language that logically flows where you want it to go?

Re: duplicate keys on hash
by locked_user sundialsvc4 (Abbot) on Feb 11, 2015 at 13:03 UTC

    Furthermore, Perl has a neat feature called, of all things, “autovivification” which makes this sort of thing very easy to express.   For example:

    my $hash = {}; push @{$hash->{'foo'}}, 'bar';
    ... will in one step create a hash-key for 'foo' if it does not yet exist, and cause that bucket to contain an arrayref, and to push 'bar' onto that array.   So, in just one step, “the right thing” happens.

    Each element of the resulting hash will therefore contain an arrayref, even if that arrayref contains only one entry.   In this way, an arbitrary number of values may be stored.   The same auto-vification trick can be used to construct other structures, as well.


    Edited to fix the braces.

      Please, test your code before posting.
      Scalar found where operator expected at - line 2, near "@($hash" (Missing operator before $hash?) Useless use of push with no values at - line 2. syntax error at - line 2, near "@($hash" Execution of - aborted due to compilation errors.

      Dereference uses curly brackets, not the round ones.

      لսႽ† ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ
        Oops.