Boldra has asked for the wisdom of the Perl Monks concerning the following question:

This may quickly become a golf question, a colleague wrote a subroutine over 50 lines to solve this one.

I have an array which looks a lot like a hash. It has key/value pairs, and has a guaranteed even length. The only thing stopping it from being a hash is that some keys are repeated. I want to turn it into a hash, by making each value into an arrayref containing all the values with that key. eg.
$in = [ one => 1, two => 2, two => '2.003' ]; # must become $out = { one => [1], two => [ 2, '2.003' ] };
The order of the values in the original array is not important (as if it were a hash), and the order of the values in the new array refs is also unimportant.


Update
I should have mentioned, I'm stuck with perl5.6 for this task. I would, however, be interested in how it might look in perl6!


- Boldra

Replies are listed 'Best First'.
Re: my array is almost a hash, but keys are not unique.
by ELISHEVA (Prior) on Apr 03, 2009 at 08:31 UTC

    50 lines? What language? A slightly more efficient option:

    my $in = [ one => 1, two => 2, two => '2.003' ]; my $out = {}; for(my $i=0; $i<$#$in; $i+=2) { push @{$out->{$in->[$i]}}, $in->[$i+1]; }

    or for a game of golf

    my $i=0; push(@{$out->{$in->[$i++]}},$in->[$i++]) while $i<$#$in;

    Best, beth

      Well, I don't want to embarrass him too much, but his approach was to step through the array, look at each element and guess whether it was meant to be a key or a value, then use a '$last_key' variable to put the value into a new hash. There's also quite a bit of warning code in his routine, leftover from debugging.


      - Boldra

      The golfed version may not 'Do The Right Thing'™ due to the uncertainty of the order of evaluation of the two increments, and even if it works today, it may not work tomorrow with a different version of Perl.


      True laziness is hard work

        Which is exactly why I offered it as a golfed version only - but I probably should have mentioned that lest someone use the golfed version in production code. Hopefully you've saved someone a world of trouble.

        Being an old C programmer, I was taught never to trust the order of evaluation of parameters (right to left, left to right or something in between). But it does seem to work consistently on my version of Perl (5.8.8) - raising the question: does Perl have a defined order of evaluation for parameters or is it like "other languages" and leaves order to the optimizer and compiler implementation? I looked for a citation either way this AM and had no luck.

        Best, beth

Re: my array is almost a hash, but keys are not unique. (Perl 6)
by moritz (Cardinal) on Apr 03, 2009 at 08:53 UTC
    Note that in Perl 6 => constructs a Pair, it's not the same as a comma.

    Assuming that you have a flat list, and not a list of pairs, it'll look like this in diplomatic idiomatic Perl 6:

    my $in = [ one => 1, two => 2, two => '2.003' ]; my %h; for @$in -> $k, $v { %h{$k}.push: $v; }

    (not golfed).

    Rakudo doesn't support this kind of autovivification yet, so the .push on the empty hash bucket fails. However this works in Rakudo:

    use v6; my $in = [ 'one', 1, 'two', 2, 'two', '2.003' ]; my %h; for @($in) -> $k, $v { if %h{$k} { %h{$k}.push: $v; } else { %h{$k} = [ $v ]; } } say %h.perl; # output: # {"one" => [1], "two" => [2, "2.003"]}

    if $in is an Array of pairs instead, you can loop over them, and access .key and .value (a method call with a leading dot but no invocant defaults to $_):

    use v6; my $in = [ one => 1, two => 2, two => '2.003' ]; my %h; for @($in) { if %h{.key} { %h{.key}.push: .value; } else { %h{.key} = [ .value ]; } } say %h.perl; # vim: ft=perl6 sw=4 ts=4 expandtab # same output

    Again, a complete Perl 6 implementation would allow it shorter like this:

    %h{.key}.push: .value for @$in:

    Even nicer, but also not yet implement is

    my %h = $in.classify: { $_ };

    See the documentation of classify for more information.

    Update: A nicer way to write the first loop is

    for @($in) -> $k, $v { %h{$k} //= []; %h{$k}.push: $v; }
      That's excellent, thanks Moritz!

      I'd seen something recently about how for can work on pairs, but seeing it used to solve a problem I'm working on makes it much clearer.

      The final classify example has me really puzzled. I can't see the array creation, nor see what the test does.

      In my current program, the $in arrayref is defined by doing a file, so I suppose I would have the option in perl 6 of doing that file as perl6, and getting pairs.

      BTW, what do you mean by "diplomatic perl 6" ? Is that something like "perl 6, no matter who you ask", because different people have different ideas what's going to be in it?


      - Boldra
        The final classify example has me really puzzled. I can't see the array creation, nor see what the test does.

        I omitted the array creation, because it's the same as above (array of pairs)

        The test does basically nothing, it just returns the Pair unchanged. Since it is supposed to return a Pair anyway, it nicely fits into what .classify attempts.

        BTW, what do you mean by "diplomatic perl 6"

        I meant to write 'idiomatic', made a typo, and picked the wrong suggestion from my spell checker :-)

Re: my array is almost a hash, but keys are not unique.
by linuxer (Curate) on Apr 03, 2009 at 08:17 UTC

    Let's have a look at your data:

    #! /usr/bin/perl use strict; use warnings; use Data::Dumper; # this remains a plain array! # '=>' is a 'fat comma', so we have a simple list # with '=>' you don't need to quote the string to its left my $in = [ one => 1, two => 2, two => '2.003' ]; print Dumper( $in ); __END__ $VAR1 = [ 'one', 1, 'two', 2, 'two', '2.003' ];

    To transform the array into a hash (of arrays), the following should do the job:

    my %hash; while ( my ( $key, $val ) = splice( @$in, 0, 2 ) ) { push @{ $hash{$key} }, $val; }

    Update:

    • Inserted short explanation about "fat comma" into code
Re: my array is almost a hash, but keys are not unique.
by Utilitarian (Vicar) on Apr 03, 2009 at 08:43 UTC
    Hi Boldra, the following can obviously be golfed but left this way for clarity.
    use strict; my %hash; my @array=qw(one 1 two 2 two 20); for (my $i=0;$i<@array;$i+=2){ push @{$hash{$array[$i]}}, $array[$i+1]; } for my $key( keys %hash){ print "$key => @{$hash{$key}}\n"; }
    UPDATE: Apologies to ELISHEVA whose answer above is almost identical with the added benefit of golfing too, I took far to long to reply and so hadn't seen that response
Re: my array is almost a hash, but keys are not unique.
by bart (Canon) on Apr 03, 2009 at 10:24 UTC
    The only thing stopping it from being a hash is that some keys are repeated.
    Stupid trick with a CPAN module: Tie::DxHash. Granted, that does more than you asked for, as it preserves insertion order. But, it does allow duplicate keys.
Re: my array is almost a hash, but keys are not unique.
by GrandFather (Saint) on Apr 06, 2009 at 02:41 UTC

    If you eschew the C for loop where possible, and don't mind destroying the source array, then you can:

    use strict; use warnings; use Data::Dump::Streamer; my @in = (one => 1, two => 2, two => '2.003'); my %hash; while (@in > 1) { my ($key, $value) = splice @in, 0, 2; push @{$hash{$key}}, $value; } Dump \%hash;

    Prints:

    $HASH1 = { one => [ 1 ], two => [ 2, 2.003 ] };

    True laziness is hard work
      It's true; I do eschew. Don't you?


      - Boldra

        Indeed I do. The Perl for/foreach loop is generally much clearer than the C for loop, and is almost never bug bait (the C for loop is prone to off by 1 errors).

        It's the 'destroying the source array' part that may be more of an issue if the array is large enough that you might think twice about copying it and are into premature optimization.


        True laziness is hard work
Re: my array is almost a hash, but keys are not unique.
by Anonymous Monk on Apr 03, 2009 at 08:36 UTC
    #!/usr/bin/perl -- use strict; use warnings; my $in = [ one => 1, two => 2, two => '2.003' ]; my $ou = {}; for( my $ix = 0; $ix < $#$in; $ix += 2){ push @{ $ou->{ $in->[$ix] } }, $in->[$ix+1]; } use Data::Dumper; print Data::Dumper->new([ $in, $ou ] )->Indent(1)->Dump; __END__ $VAR1 = [ 'one', 1, 'two', 2, 'two', '2.003' ]; $VAR2 = { 'one' => [ 1 ], 'two' => [ 2, '2.003' ] };