in reply to Re^2: RFC: List::Extract
in thread RFC: List::Extract

I have put together a benchmark testing lodin's code and the double-grep plus merlyn's and my code against different sized arrays and sifting out differing amounts of data. I couldn't get to grips with the subroutine prototype so I took that out of merlyn's routine. Here is the code.

use strict; use warnings; use Benchmark q{cmpthese}; my @tiny = do { map { int rand 100 } 1 .. 1000 }; my @small = do { map { int rand 100 } 1 .. 5000 }; my @medium = do { map { int rand 100 } 1 .. 10000 }; my @large = do { map { int rand 100 } 1 .. 50000 }; my @cutoffs = map { $_ * 10 } 1, 3, 5, 7, 9; my ( @array, @extracted ); my $rsCutoff = \ do { my $anon }; my $rcSift = sub { my $toTest = shift; return $toTest >= ${ $rsCutoff } ? 0 : 1; }; foreach my $cutoff ( @cutoffs ) { print qq{\n@{ [ q{=} x 60 ] }\n} unless $cutoff == 10; ${ $rsCutoff } = $cutoff; foreach my $raOrig ( \ @tiny, \ @small, \ @medium, \ @large ) { print qq{\n}, qq{Sifting approx. $cutoff% from }, qq{array of @{ [ scalar @$raOrig ] } }, qq{integers from 0 to 99\n}, ; cmpthese ( -3, { johngg => sub { my @array = @$raOrig; my @extracted = johngg( $rcSift, \ @array ); }, lodin => sub { my @array = @$raOrig; my @extracted = lodin( $rcSift, \ @array ); }, merlyn => sub { my @array = @$raOrig; my @extracted = merlyn( $rcSift, \ @array ); }, twoGreps => sub { my @array = @$raOrig; my @extracted = twoGreps( $rcSift, \ @array ); }, } ); } } sub johngg { my ( $rcSift, $raOrig ) = @_; return reverse map { splice @$raOrig, $_, 1 } grep { $rcSift->( $raOrig->[ $_ ] ) } reverse 0 .. $#$raOrig; } sub lodin { my ($code, $list) = @_; my @extracted; my $c = 0; while ($c < @$list) { local *_ = \$list->[$c]; if ($code->($_)) { push @extracted, splice @$list, $c, 1; } else { $c++; } } return @extracted; } sub merlyn { my ($code, $aref) = @_; my @return; my @replace; for (@$aref) { if ($code->($_)) { push @return, $_; } else { push @replace, $_; } } @$aref = @replace; return @return; } sub twoGreps { my ( $rcSift, $raOrig ) = @_; my @extracted = grep { $rcSift->( $_ ) } @$raOrig; @$raOrig = grep { ! $rcSift->( $_ ) } @$raOrig; return @extracted; }

Here's the results.

Sifting approx. 10% from array of 1000 integers from 0 to 99 Rate twoGreps lodin merlyn johngg twoGreps 326/s -- -36% -50% -62% lodin 510/s 56% -- -22% -40% merlyn 654/s 100% 28% -- -23% johngg 855/s 162% 68% 31% -- Sifting approx. 10% from array of 5000 integers from 0 to 99 Rate twoGreps lodin merlyn johngg twoGreps 62.4/s -- -34% -51% -58% lodin 94.7/s 52% -- -25% -37% merlyn 127/s 103% 34% -- -16% johngg 150/s 141% 59% 18% -- Sifting approx. 10% from array of 10000 integers from 0 to 99 Rate twoGreps lodin merlyn johngg twoGreps 30.0/s -- -31% -51% -55% lodin 43.3/s 45% -- -30% -34% merlyn 61.5/s 105% 42% -- -7% johngg 66.1/s 121% 53% 7% -- Sifting approx. 10% from array of 50000 integers from 0 to 99 Rate twoGreps lodin johngg merlyn twoGreps 4.31/s -- -16% -38% -43% lodin 5.15/s 19% -- -25% -31% johngg 6.89/s 60% 34% -- -8% merlyn 7.51/s 74% 46% 9% -- ============================================================ Sifting approx. 30% from array of 1000 integers from 0 to 99 Rate twoGreps lodin merlyn johngg twoGreps 280/s -- -37% -52% -57% lodin 446/s 60% -- -24% -31% merlyn 588/s 110% 32% -- -9% johngg 650/s 132% 46% 10% -- Sifting approx. 30% from array of 5000 integers from 0 to 99 Rate twoGreps johngg lodin merlyn twoGreps 41.8/s -- -38% -40% -48% johngg 67.6/s 62% -- -3% -16% lodin 70.0/s 67% 4% -- -13% merlyn 80.1/s 92% 18% 14% -- Sifting approx. 30% from array of 10000 integers from 0 to 99 Rate twoGreps lodin johngg merlyn twoGreps 20.2/s -- -31% -37% -47% lodin 29.3/s 45% -- -9% -23% johngg 32.1/s 59% 10% -- -15% merlyn 37.9/s 88% 29% 18% -- Sifting approx. 30% from array of 50000 integers from 0 to 99 Rate lodin johngg twoGreps merlyn lodin 2.89/s -- -14% -26% -59% johngg 3.35/s 16% -- -14% -52% twoGreps 3.92/s 35% 17% -- -44% merlyn 7.04/s 143% 110% 80% -- ============================================================ Sifting approx. 50% from array of 1000 integers from 0 to 99 Rate twoGreps lodin johngg merlyn twoGreps 267/s -- -35% -53% -54% lodin 412/s 54% -- -28% -30% johngg 571/s 114% 39% -- -3% merlyn 587/s 119% 42% 3% -- Sifting approx. 50% from array of 5000 integers from 0 to 99 Rate twoGreps lodin johngg merlyn twoGreps 39.7/s -- -35% -40% -49% lodin 60.8/s 53% -- -9% -23% johngg 66.5/s 68% 9% -- -15% merlyn 78.6/s 98% 29% 18% -- Sifting approx. 50% from array of 10000 integers from 0 to 99 Rate twoGreps lodin johngg merlyn twoGreps 19.2/s -- -21% -31% -48% lodin 24.4/s 27% -- -12% -34% johngg 27.8/s 45% 14% -- -25% merlyn 36.9/s 92% 51% 33% -- Sifting approx. 50% from array of 50000 integers from 0 to 99 Rate lodin johngg twoGreps merlyn lodin 2.23/s -- -18% -40% -68% johngg 2.70/s 21% -- -27% -61% twoGreps 3.71/s 66% 37% -- -47% merlyn 7.00/s 214% 159% 89% -- ============================================================ Sifting approx. 70% from array of 1000 integers from 0 to 99 Rate twoGreps lodin johngg merlyn twoGreps 264/s -- -33% -51% -56% lodin 392/s 49% -- -27% -34% johngg 538/s 104% 37% -- -10% merlyn 595/s 125% 52% 11% -- Sifting approx. 70% from array of 5000 integers from 0 to 99 Rate twoGreps lodin johngg merlyn twoGreps 38.4/s -- -31% -38% -51% lodin 55.4/s 44% -- -11% -29% johngg 62.1/s 62% 12% -- -20% merlyn 77.8/s 102% 40% 25% -- Sifting approx. 70% from array of 10000 integers from 0 to 99 Rate twoGreps lodin johngg merlyn twoGreps 19.0/s -- -17% -29% -47% lodin 22.9/s 21% -- -14% -37% johngg 26.7/s 41% 17% -- -26% merlyn 36.2/s 90% 58% 35% -- Sifting approx. 70% from array of 50000 integers from 0 to 99 Rate lodin johngg twoGreps merlyn lodin 2.14/s -- -22% -40% -69% johngg 2.74/s 28% -- -24% -61% twoGreps 3.59/s 68% 31% -- -49% merlyn 7.01/s 227% 155% 95% -- ============================================================ Sifting approx. 90% from array of 1000 integers from 0 to 99 Rate twoGreps lodin johngg merlyn twoGreps 263/s -- -31% -49% -57% lodin 378/s 44% -- -27% -38% johngg 520/s 98% 38% -- -15% merlyn 612/s 133% 62% 18% -- Sifting approx. 90% from array of 5000 integers from 0 to 99 Rate twoGreps lodin johngg merlyn twoGreps 37.6/s -- -33% -40% -52% lodin 56.0/s 49% -- -10% -28% johngg 62.5/s 66% 12% -- -20% merlyn 78.2/s 108% 40% 25% -- Sifting approx. 90% from array of 10000 integers from 0 to 99 Rate twoGreps lodin johngg merlyn twoGreps 18.2/s -- -26% -36% -51% lodin 24.7/s 36% -- -13% -33% johngg 28.4/s 56% 15% -- -23% merlyn 37.0/s 103% 50% 30% -- Sifting approx. 90% from array of 50000 integers from 0 to 99 Rate lodin twoGreps johngg merlyn lodin 2.92/s -- -18% -25% -58% twoGreps 3.56/s 22% -- -9% -49% johngg 3.92/s 34% 10% -- -44% merlyn 7.04/s 141% 98% 80% --

It seems to show that my code has a bit of an advantage when the arrays are not too large and when less data is being sifted out. It is noticeable that merlyn's code isn't impacted the more data that is sifted whereas mine is. I guess that is because he is always building two arrays whereas my map / splice has to do more work the more data is sifted. Also note the sudden elevation of twoGreps with large arrays for the 30%, 50% and 70% sifts but not for 10% or 90%.

I've cocked up benchmarks before, however, so take all this with a pinch of salt. I won't be surprised if somebody tells me I've done the same this time.

I hope this is of interest.

Cheers,

JohnGG