Re^3: RFC: List::Extract

I have put together a benchmark testing lodin's code and the double-grep plus merlyn's and my code against different sized arrays and sifting out differing amounts of data. I couldn't get to grips with the subroutine prototype so I took that out of merlyn's routine. Here is the code.

use strict;
use warnings;

use Benchmark q{cmpthese};

my @tiny   = do
   {
       map { int rand 100 } 1 .. 1000
   };
my @small  = do
   {
       map { int rand 100 } 1 .. 5000
   };
my @medium = do
   {
       map { int rand 100 } 1 .. 10000
   };
my @large  = do
   {
       map { int rand 100 } 1 .. 50000
   };
my @cutoffs = map { $_ * 10 } 1, 3, 5, 7, 9;

my ( @array, @extracted );
my $rsCutoff = \ do { my $anon };

my $rcSift = sub
   {
       my $toTest = shift;
       return $toTest >= ${ $rsCutoff } ? 0 : 1;
   };

foreach my $cutoff ( @cutoffs )
{
    print qq{\n@{ [ q{=} x 60 ] }\n} unless $cutoff == 10;
    ${ $rsCutoff } = $cutoff;
    foreach my $raOrig ( \ @tiny, \ @small, \ @medium, \ @large )
    {
        print
           qq{\n},
           qq{Sifting approx. $cutoff% from },
           qq{array of @{ [ scalar @$raOrig ] } },
           qq{integers from 0 to 99\n},
           ;
        cmpthese (
           -3,
           {
               johngg   => sub
                  {
                      my @array = @$raOrig;
                      my @extracted = johngg( $rcSift, \ @array );
                  },
               lodin    => sub
                  {
                      my @array = @$raOrig;
                      my @extracted = lodin( $rcSift, \ @array );
                  },
               merlyn   => sub
                  {
                      my @array = @$raOrig;
                      my @extracted = merlyn( $rcSift, \ @array );
                  },
               twoGreps => sub
                  {
                      my @array = @$raOrig;
                      my @extracted = twoGreps( $rcSift, \ @array );
                  },
           }
           );
    }
}

sub johngg
{
    my ( $rcSift, $raOrig ) = @_;
    return
       reverse
       map  { splice @$raOrig, $_, 1 }
       grep { $rcSift->( $raOrig->[ $_ ] ) }
       reverse 0 .. $#$raOrig;
}

sub lodin
{
    my ($code, $list) = @_;
    my @extracted;
    my $c = 0;
    while ($c < @$list) {
        local *_ = \$list->[$c];
        if ($code->($_)) {
            push @extracted, splice @$list, $c, 1;
        }
        else {
            $c++;
        }
    }
    return @extracted;
}

sub merlyn {
  my ($code, $aref) = @_;
  my @return;
  my @replace;
  for (@$aref) {
    if ($code->($_)) {
      push @return, $_;
    } else {
      push @replace, $_;
    }
  }
  @$aref = @replace;
  return @return;
}

sub twoGreps
{
    my ( $rcSift, $raOrig ) = @_;
    my @extracted = grep { $rcSift->( $_ ) } @$raOrig;
    @$raOrig = grep { ! $rcSift->( $_ ) } @$raOrig;
    return @extracted;
}
[download]

Here's the results.

Sifting approx. 10% from array of 1000 integers from 0 to 99
          Rate twoGreps    lodin   merlyn   johngg
twoGreps 326/s       --     -36%     -50%     -62%
lodin    510/s      56%       --     -22%     -40%
merlyn   654/s     100%      28%       --     -23%
johngg   855/s     162%      68%      31%       --

Sifting approx. 10% from array of 5000 integers from 0 to 99
           Rate twoGreps    lodin   merlyn   johngg
twoGreps 62.4/s       --     -34%     -51%     -58%
lodin    94.7/s      52%       --     -25%     -37%
merlyn    127/s     103%      34%       --     -16%
johngg    150/s     141%      59%      18%       --

Sifting approx. 10% from array of 10000 integers from 0 to 99
           Rate twoGreps    lodin   merlyn   johngg
twoGreps 30.0/s       --     -31%     -51%     -55%
lodin    43.3/s      45%       --     -30%     -34%
merlyn   61.5/s     105%      42%       --      -7%
johngg   66.1/s     121%      53%       7%       --

Sifting approx. 10% from array of 50000 integers from 0 to 99
           Rate twoGreps    lodin   johngg   merlyn
twoGreps 4.31/s       --     -16%     -38%     -43%
lodin    5.15/s      19%       --     -25%     -31%
johngg   6.89/s      60%      34%       --      -8%
merlyn   7.51/s      74%      46%       9%       --

============================================================

Sifting approx. 30% from array of 1000 integers from 0 to 99
          Rate twoGreps    lodin   merlyn   johngg
twoGreps 280/s       --     -37%     -52%     -57%
lodin    446/s      60%       --     -24%     -31%
merlyn   588/s     110%      32%       --      -9%
johngg   650/s     132%      46%      10%       --

Sifting approx. 30% from array of 5000 integers from 0 to 99
           Rate twoGreps   johngg    lodin   merlyn
twoGreps 41.8/s       --     -38%     -40%     -48%
johngg   67.6/s      62%       --      -3%     -16%
lodin    70.0/s      67%       4%       --     -13%
merlyn   80.1/s      92%      18%      14%       --

Sifting approx. 30% from array of 10000 integers from 0 to 99
           Rate twoGreps    lodin   johngg   merlyn
twoGreps 20.2/s       --     -31%     -37%     -47%
lodin    29.3/s      45%       --      -9%     -23%
johngg   32.1/s      59%      10%       --     -15%
merlyn   37.9/s      88%      29%      18%       --

Sifting approx. 30% from array of 50000 integers from 0 to 99
           Rate    lodin   johngg twoGreps   merlyn
lodin    2.89/s       --     -14%     -26%     -59%
johngg   3.35/s      16%       --     -14%     -52%
twoGreps 3.92/s      35%      17%       --     -44%
merlyn   7.04/s     143%     110%      80%       --

============================================================

Sifting approx. 50% from array of 1000 integers from 0 to 99
          Rate twoGreps    lodin   johngg   merlyn
twoGreps 267/s       --     -35%     -53%     -54%
lodin    412/s      54%       --     -28%     -30%
johngg   571/s     114%      39%       --      -3%
merlyn   587/s     119%      42%       3%       --

Sifting approx. 50% from array of 5000 integers from 0 to 99
           Rate twoGreps    lodin   johngg   merlyn
twoGreps 39.7/s       --     -35%     -40%     -49%
lodin    60.8/s      53%       --      -9%     -23%
johngg   66.5/s      68%       9%       --     -15%
merlyn   78.6/s      98%      29%      18%       --

Sifting approx. 50% from array of 10000 integers from 0 to 99
           Rate twoGreps    lodin   johngg   merlyn
twoGreps 19.2/s       --     -21%     -31%     -48%
lodin    24.4/s      27%       --     -12%     -34%
johngg   27.8/s      45%      14%       --     -25%
merlyn   36.9/s      92%      51%      33%       --

Sifting approx. 50% from array of 50000 integers from 0 to 99
           Rate    lodin   johngg twoGreps   merlyn
lodin    2.23/s       --     -18%     -40%     -68%
johngg   2.70/s      21%       --     -27%     -61%
twoGreps 3.71/s      66%      37%       --     -47%
merlyn   7.00/s     214%     159%      89%       --

============================================================

Sifting approx. 70% from array of 1000 integers from 0 to 99
          Rate twoGreps    lodin   johngg   merlyn
twoGreps 264/s       --     -33%     -51%     -56%
lodin    392/s      49%       --     -27%     -34%
johngg   538/s     104%      37%       --     -10%
merlyn   595/s     125%      52%      11%       --

Sifting approx. 70% from array of 5000 integers from 0 to 99
           Rate twoGreps    lodin   johngg   merlyn
twoGreps 38.4/s       --     -31%     -38%     -51%
lodin    55.4/s      44%       --     -11%     -29%
johngg   62.1/s      62%      12%       --     -20%
merlyn   77.8/s     102%      40%      25%       --

Sifting approx. 70% from array of 10000 integers from 0 to 99
           Rate twoGreps    lodin   johngg   merlyn
twoGreps 19.0/s       --     -17%     -29%     -47%
lodin    22.9/s      21%       --     -14%     -37%
johngg   26.7/s      41%      17%       --     -26%
merlyn   36.2/s      90%      58%      35%       --

Sifting approx. 70% from array of 50000 integers from 0 to 99
           Rate    lodin   johngg twoGreps   merlyn
lodin    2.14/s       --     -22%     -40%     -69%
johngg   2.74/s      28%       --     -24%     -61%
twoGreps 3.59/s      68%      31%       --     -49%
merlyn   7.01/s     227%     155%      95%       --

============================================================

Sifting approx. 90% from array of 1000 integers from 0 to 99
          Rate twoGreps    lodin   johngg   merlyn
twoGreps 263/s       --     -31%     -49%     -57%
lodin    378/s      44%       --     -27%     -38%
johngg   520/s      98%      38%       --     -15%
merlyn   612/s     133%      62%      18%       --

Sifting approx. 90% from array of 5000 integers from 0 to 99
           Rate twoGreps    lodin   johngg   merlyn
twoGreps 37.6/s       --     -33%     -40%     -52%
lodin    56.0/s      49%       --     -10%     -28%
johngg   62.5/s      66%      12%       --     -20%
merlyn   78.2/s     108%      40%      25%       --

Sifting approx. 90% from array of 10000 integers from 0 to 99
           Rate twoGreps    lodin   johngg   merlyn
twoGreps 18.2/s       --     -26%     -36%     -51%
lodin    24.7/s      36%       --     -13%     -33%
johngg   28.4/s      56%      15%       --     -23%
merlyn   37.0/s     103%      50%      30%       --

Sifting approx. 90% from array of 50000 integers from 0 to 99
           Rate    lodin twoGreps   johngg   merlyn
lodin    2.92/s       --     -18%     -25%     -58%
twoGreps 3.56/s      22%       --      -9%     -49%
johngg   3.92/s      34%      10%       --     -44%
merlyn   7.04/s     141%      98%      80%       --
[download]

It seems to show that my code has a bit of an advantage when the arrays are not too large and when less data is being sifted out. It is noticeable that merlyn's code isn't impacted the more data that is sifted whereas mine is. I guess that is because he is always building two arrays whereas my map / splice has to do more work the more data is sifted. Also note the sudden elevation of twoGreps with large arrays for the 30%, 50% and 70% sifts but not for 10% or 90%.

I've cocked up benchmarks before, however, so take all this with a pinch of salt. I won't be surprised if somebody tells me I've done the same this time.

I hope this is of interest.

Cheers,

JohnGG

Comment on Re^3: RFC: List::Extract Select or Download Code