Processing arrays 2 elements at a time (TIMTOWTDI)

I had a reference to an array the other day, that looked something like this:

my $r = [ 1, 2, 3, 4, 5, 6 ];

the problem being that I wanted to restructure the values, by taking them by pairs, hence:

my $r = [ [1, 2], [3, 4], [5, 6] ];

Looks easy enough, and certainly in Perl 6 it will be a snap, since you can iterate over lists taking n at a time (notwithstanding any glaring syntax errors due to my ignorance):

  for @$r -> $x, $y {
    push @$pairs, [$x, $y];
  }
[download]

But in Perl 5 things are not quite so simple. I'm pretty sure this topic has been covered before, but after searching for a while on obvious things (qw/array element processing time/), all I found was Processing Multiple Array Elements and Breaking output lines into N-element chunks but I'm sure there's another one out there somewhere. So I turned to the chatterbox, and with a little help from my friends™ I managed to get it figured out, and so I thought I'd preserve the solutions here for posterity.

My first idea (no giggling please) was to either push a new anonymous array onto the end of the array or push an element to the last array of the array, which gives:

my $r2;
my $flipper = 0;
for( @$r ) {
  if( $flipper++ % 2 ) {
    push @{$r2->[$#{$r2}]}, $_;
  }
  else {
    push @$r2, [ $_ ];
  }
}
[download]

It looks pretty silly, but it doesn't destroy the contents of $r, and it can scale beyond more than 2 elements at a time (by saying $flipper++ % 2 >. At this point other people chimed in with different suggestions.

domm suggested treating the array as a hash and iterating over keys and values:

my %h=(@$r);
for (keys %h) {
  push @$r2, [$_, $h2{$_} ]
}
# or possibly
# push @$r2, [$_, $h2{$_} ] for keys %h
[download]

But that has a number of problems

it doesn't preserve the order of the array;
all array elements that become keys must be unique;
the keys are stringified;
it doesn't scale beyond 2 elements at a time.

BrowserUk suggested the following map:

my $i = 0;
my $r2 = map {[ $r->[$i*2], $r->[$i++*2+1] ]}
         (0 .. $#{$r}/2 - 1);
    # or (0 .. scalar(@$r)/2 - 1)
[download]

It does the job, but there's too much happening for it to be clear what is going on, specifically the limit-1 and postincrement. Plus it's really hard to format the whitespace. This was later amended to:

$r2 = [ map{ [ $r->[$_*2], $r->[$_*2+1] ] } (0 .. $#{$r}/2) ]

which gets rid of the icky index variable, and then for an encore rearranged the code to use array slices instead: $r2->[$_] = [ @$r[$_*2, $_*2+1] ] for (0..$#{$r} / 2);

sauoq suggested a good olde C-like approach:

for( my $i=0; $i < scalar @$r; $i += 2 ) {
  $r2->[$i/2] = [ @$r[$i, $i+1] ]
}
[download]

To a Perl beginner, it will be pretty obvious what's going on.

Someone else suggested (and I forget who (and that's maybe just as well)) doing the following:

my @copy = @$r;
for( @copy ) {
    push @$r2, [ $_, shift @copy ];
}
[download]

But using shift in a foreach is evil. Sometimes you can get away with it, but in this case the result is wrong. It's off-by-one in a bizarre way. And of course it destroys the source array.

But I think this prodded someone else into getting this approach done the right way, for they (again I forget who) suggested:

my @copy = @$r;
while( @copy ) {
    push @$r2, [ shift @c, shift @c ];
}
[download]

and in actual fact that is the code that I adopted (despite the fact that it is a destructive read), the main criterion being how easy is it for a non-Perl expert to understand. The code already slings references to anonymous structures all about, so this loop has a certain appealing clarity.

jmcnamara chimed in late with:

  $r2 = [ map { [$r->[2*$_], $r[2*$_+1]] } 0 .. scalar @$r / 2 - 1 ]
[download]

which could, in light of the other solutions, be amended to

  $r2 = [ map { [@$r[2*$_, 2*$_+1]] } 0 .. scalar @$r / 2 - 1 ]
[download]

In one of the threads I found, merlyn suggests using splice, which would give something like

@copy = @$r;
while (my @pair = splice @copy, 0, 2) {
  push @$r2, [@pair];
}
[download]

This particular variant has the property of having the simplest and clearest method of extending to an arbitrary number of tuples. Just increase the value of the splice's third parameter.

Well all that's well and good, but what would a TIMTOWTDI meditation be without a Benchmark

Array with 8 elements

Benchmark: timing 50000 iterations of aslice, c_loop, flipper,
 map_i, map_jm, map_jm2, map_pure, shifter, splicer...
    aslice: 11 wallclock secs ( 7.29 usr +  0.00 sys =  7.29 CPU) @ 6858.71/s (n=50000)
    c_loop: 10 wallclock secs ( 5.91 usr +  0.00 sys =  5.91 CPU) @ 8460.24/s (n=50000)
   flipper: 18 wallclock secs ( 9.79 usr +  0.00 sys =  9.79 CPU) @ 5107.25/s (n=50000)
     map_i:  9 wallclock secs ( 5.37 usr +  0.00 sys =  5.37 CPU) @ 9310.99/s (n=50000)
    map_jm:  9 wallclock secs ( 5.06 usr +  0.00 sys =  5.06 CPU) @ 9881.42/s (n=50000)
   map_jm2:  8 wallclock secs ( 5.00 usr +  0.00 sys =  5.00 CPU) @ 10000.00/s (n=50000)
  map_pure:  7 wallclock secs ( 5.30 usr +  0.00 sys =  5.30 CPU) @ 9433.96/s (n=50000)
   shifter:  6 wallclock secs ( 3.99 usr +  0.00 sys =  3.99 CPU) @ 12531.33/s (n=50000)
   splicer:  8 wallclock secs ( 5.91 usr +  0.00 sys =  5.91 CPU) @ 8460.24/s (n=50000)
            Rate flipper aslice c_loop splicer map_i map_pure map_jm map_jm2 shifter
flipper   5107/s      --   -26%   -40%    -40%  -45%     -46%   -48%    -49%    -59%
aslice    6859/s     34%     --   -19%    -19%  -26%     -27%   -31%    -31%    -45%
c_loop    8460/s     66%    23%     --     -0%   -9%     -10%   -14%    -15%    -32%
splicer   8460/s     66%    23%     0%      --   -9%     -10%   -14%    -15%    -32%
map_i     9311/s     82%    36%    10%     10%    --      -1%    -6%     -7%    -26%
map_pure  9434/s     85%    38%    12%     12%    1%       --    -5%     -6%    -25%
map_jm    9881/s     93%    44%    17%     17%    6%       5%     --     -1%    -21%
map_jm2  10000/s     96%    46%    18%     18%    7%       6%     1%      --    -20%
shifter  12531/s    145%    83%    48%     48%   35%      33%    27%     25%      --

Array with 1000 elements

Benchmark: timing 500 iterations of aslice, c_loop, flipper,
 map_i, map_jm, map_jm2, map_pure, shifter, splicer...
    aslice:  8 wallclock secs ( 6.26 usr +  0.00 sys =  6.26 CPU) @ 79.87/s (n=500)
    c_loop:  6 wallclock secs ( 4.89 usr +  0.00 sys =  4.89 CPU) @ 102.25/s (n=500)
   flipper: 14 wallclock secs ( 9.55 usr +  0.00 sys =  9.55 CPU) @ 52.36/s (n=500)
     map_i:  9 wallclock secs ( 4.83 usr +  0.00 sys =  4.83 CPU) @ 103.52/s (n=500)
    map_jm:  9 wallclock secs ( 4.57 usr +  0.00 sys =  4.57 CPU) @ 109.41/s (n=500)
   map_jm2:  8 wallclock secs ( 4.62 usr +  0.00 sys =  4.62 CPU) @ 108.23/s (n=500)
  map_pure:  6 wallclock secs ( 4.81 usr +  0.00 sys =  4.81 CPU) @ 103.95/s (n=500)
   shifter:  4 wallclock secs ( 3.53 usr +  0.00 sys =  3.53 CPU) @ 141.64/s (n=500)
   splicer:  8 wallclock secs ( 5.50 usr +  0.00 sys =  5.50 CPU) @ 90.91/s (n=500)
           Rate flipper aslice splicer c_loop map_i map_pure map_jm2 map_jm shifter
flipper  52.4/s      --   -34%    -42%   -49%  -49%     -50%    -52%   -52%    -63%
aslice   79.9/s     53%     --    -12%   -22%  -23%     -23%    -26%   -27%    -44%
splicer  90.9/s     74%    14%      --   -11%  -12%     -13%    -16%   -17%    -36%
c_loop    102/s     95%    28%     12%     --   -1%      -2%     -6%    -7%    -28%
map_i     104/s     98%    30%     14%     1%    --      -0%     -4%    -5%    -27%
map_pure  104/s     99%    30%     14%     2%    0%       --     -4%    -5%    -27%
map_jm2   108/s    107%    35%     19%     6%    5%       4%      --    -1%    -24%
map_jm    109/s    109%    37%     20%     7%    6%       5%      1%     --    -23%
shifter   142/s    171%    77%     56%    39%   37%      36%     31%    29%      --

The code

#! /usr/bin/perl -w

use strict;
use Benchmark qw/cmpthese/;
use Data::Dumper;

$Data::Dumper::Indent = 0;

my $iters = shift || 1000;
my $len   = shift || 8;

sub flipper {
    my @r2;
  my $flipper = 0;
  for( @_ ) {
    if( $flipper++ % 2 ) {
      push @{$r2[$#r2]}, $_;
    }
    else {
       push @r2, [ $_ ];
    }
  }
  \@r2;
}

sub map_i {
  my $i = 0;
  [ map { [$_[$i*2], $_[$i++*2+1]] } (0 .. scalar(@_)/2 - 1) ];
}

sub shifter {
  my @r2;
  while( @_ ) {
    push @r2, [ shift @_, shift @_ ];
  }
  \@r2;
}

sub c_loop {
  my @r2;
  for( my $i=0; $i < scalar @_; $i += 2 ) {
    push @r2, [ @_[$i, $i+1] ]
  }
  \@r2;
}

sub map_pure {
  [ map{ [$_[$_*2], $_[$_*2+1]] } (0 .. scalar @_ / 2 - 1) ];
}

sub aslice {
  my @r2;
  $r2[$_] = [ @_[$_*2, $_*2+1] ] for (0 .. scalar @_ / 2 - 1);
  \@r2;
}

sub map_jm {
  [ map { [$_[2*$_], $_[2*$_+1]] } 0 .. scalar @_ / 2 - 1 ];
}

sub map_jm2 {
  [ map { [@_[2*$_, 2*$_+1]] } 0 .. scalar @_ / 2 - 1 ];
}

sub splicer {
  my @r2;
  while (my @pair = splice @_, 0, 2) {
    push @r2, [@pair];
  }
  \@r2;
}

######################################################

my $r = [ 1 .. $len ]; 
for my $sub( qw/ flipper shifter splicer c_loop aslice
    map_i map_pure map_jm map_jm2 /) {
  no strict 'refs';
  # test
  # print "$sub ", Dumper( &$sub( @$r )), "\n";
}

cmpthese(
  $iters,
  {
    flipper  => sub { flipper( @$r )},
    shifter  => sub { shifter( @$r )},
    splicer  => sub { splicer( @$r )},
    c_loop   => sub { c_loop( @$r )},
    aslice   => sub { aslice( @$r )},
    map_i    => sub { map_i( @$r )},

    map_pure => sub { map_pure( @$r )},
    map_jm   => sub { map_jm( @$r )},
    map_jm2  => sub { map_jm2( @$r )},
  },
);
__END__
[download]

I find it interesting the code that I consider as being the easiest for a non-expert to read is also the fastest... And when you see how bad my initial idea was, you know understand why I went asking in the CB (I blame lack of coffee/lack of sleep/early morning).

print@_{sort keys %_},$/if%_=split//,'= & *a?b:e\f/h^h!j+n,o@o;r$s-t%t#u'

Comment on Processing arrays 2 elements at a time (TIMTOWTDI) Select or Download Code

Replies are listed 'Best First'.
Re: Processing arrays 2 elements at a time (TIMTOWTDI) by dimmesdale (Friar) on Aug 02, 2002 at 15:44 UTC
Well, having required something like this recently, I feel obliged to comment. I particurly liked merlyn's solution: `@copy = @$r; while (my @pair = splice @copy, 0, 2) { push @$r2, [@pair]; }` [download] However, I'll modify it slightly to show everyone what I came up with. (Maybe it's just me, but bit-wise operators always seem to pop-up in my code ... probably where they shouldn't :) @copy = @$r; # I guess you know that your array will work for this, but # it may be desirable to check (especially for n-'tuples') # that the array is divisible by 2 (or n). I guess that # depending on what you're expecting this can be done on a # all or nothing basis, or inside the loop while (0..$#copy) { next if $_&1; # this skips all odds (you used a mod in # your original I think). To expand to more # 'tuples', you'd have to change this to # next if $_%n push @$r2, [$copy[$_],$copy[$_+1]]; } [download]	[reply] [d/l] [select]
Re: Processing arrays 2 elements at a time (TIMTOWTDI) by petral (Curate) on Aug 02, 2002 at 15:47 UTC
the code that I consider as being the easiest for a non-expert to read is also the fastest Although, note that it probably would not be the fastest inline: `my @copy = @$r; while( @copy ) { push @$r2, [ shift @copy, shift @copy ]; }` [download] It is only the artifact of using @_ in the sub call which avoids the copy. p	[reply] [d/l]
Re: Re: Processing arrays 2 elements at a time (TIMTOWTDI) by ichimunki (Priest) on Aug 02, 2002 at 19:38 UTC
So don't make your copy first. It's a reference after all, which in this case makes life easy. `my @copy; while (@$r) { push @copy, [ shift @$r, shift @$r ]; } $r = \@copy;` [download] This way we build our new array (while the one gets bigger, the other gets smaller), then we simply change what the reference points to.	[reply] [d/l]
Re: Re: Re: Processing arrays 2 elements at a time (TIMTOWTDI) by petral (Curate) on Aug 02, 2002 at 21:50 UTC
That's very neat. My only point was that the way grinder was presenting it (non-destructive of the original array), it would probably not be the fastest if the array copy was not hidden in the overhead for the subroutine call. p	[reply]
Re: Re: Re: Re: Processing arrays 2 elements at a time (TIMTOWTDI) by ichimunki (Priest) on Aug 02, 2002 at 22:38 UTC
Re: Processing arrays 2 elements at a time (TIMTOWTDI) by Juerd (Abbot) on Aug 03, 2002 at 10:52 UTC
Here's a somewhat more efficient splicer that is almost as fast as the shifter function: `sub s2 { my @r2; while (@_) { push @r2, [ splice @_, 0, 2 ]; } return \@r2; }` [download] This is faster than splicer() because it doesn't have the overhead of assigning the chunks to a temporary array. By removing the overhead of entering a block on each iteration, it'll be even faster: `sub s2 { my @r2; push @r2, [ splice @_, 0, 2 ] while @_; return \@r2; }` [download] As fast as the unaltered shifter() to be precise (but of course shifter() would still win if it were altered in the same way). - Yes, I reinvent wheels. - Spam: Visit eurotraQ.	[reply] [d/l] [select]
Re: Processing arrays 2 elements at a time (TIMTOWTDI) by shotgunefx (Parson) on Aug 03, 2002 at 01:40 UTC
Processing Multiple Array Elements is much slower than any of these but to be fair I wrote it a long time ago. :) I would most likely use a closure now or a closure hiding behind a regular function. It also works a little different. You can nest them. and continue were you left off. -Lee "To be civilized is to deny one's nature."	[reply]
Re: Processing arrays 2 elements at a time (TIMTOWTDI) by FoxtrotUniform (Prior) on Aug 02, 2002 at 15:59 UTC
Update: D'oh! Silly coder, read the whole article before posting. Ignore me, I'm being redundant. I'm surprised that nobody mentioned `splice`, as merlyn suggested here: `while(my @pair = splice @$r, 0, 2) { push @$r2, \@pair; }` [download] `-- The hell with paco, vote for Erudil! :wq`	[reply] [d/l] [select]
Re: Processing arrays 2 elements at a time (TIMTOWTDI) by domm (Chaplain) on Aug 03, 2002 at 18:40 UTC
Just for reference, I was the one who suggested the "shifter". And, to be honest, I suggested the evil foreach-shift-thingy, too. `-- #!/usr/bin/perl for(ref bless{},just'another'perl'hacker){s-:+-$"-g&&print$_.$/}` [download]	[reply] [d/l]