Re^3: foreach array - delete current row ? (flaws)

There is a problem with your two routines that biases the benchmark in their favour.

My routines remove any value containing a 9; yours only keep those containing a 9.

Once you correct that (unless instead of if), you get a different set of numbers:

C:\test>1036622 -N=1e3
                  Rate grep build_new_array offset_copy for_splice edi
+t_in_place
grep             678/s   --            -22%        -27%       -34%    
+      -41%
build_new_array  870/s  28%              --         -6%       -15%    
+      -24%
offset_copy      929/s  37%              7%          --        -9%    
+      -19%
for_splice      1024/s  51%             18%         10%         --    
+      -11%
edit_in_place   1150/s  70%             32%         24%        12%    
+        --

C:\test>1036622 -N=1e4
                  Rate for_splice grep offset_copy build_new_array edi
+t_in_place
for_splice      71.5/s         --  -9%        -11%            -22%    
+      -39%
grep            78.7/s        10%   --         -2%            -14%    
+      -33%
offset_copy     80.6/s        13%   2%          --            -12%    
+      -31%
build_new_array 91.3/s        28%  16%         13%              --    
+      -22%
edit_in_place    117/s        64%  49%         46%             29%    
+        --

C:\test>1036622 -N=1e5
                  Rate for_splice grep build_new_array offset_copy edi
+t_in_place
for_splice      1.29/s         -- -75%            -83%        -84%    
+      -85%
grep            5.12/s       296%   --            -33%        -35%    
+      -42%
build_new_array 7.68/s       494%  50%              --         -3%    
+      -14%
offset_copy     7.88/s       509%  54%              3%          --    
+      -11%
edit_in_place   8.89/s       587%  74%             16%         13%    
+        --

C:\test>1036622 -N=1e6
                s/iter for_splice grep build_new_array offset_copy edi
+t_in_place
for_splice        77.9         -- -98%            -98%        -99%    
+      -99%
grep              1.39      5503%   --            -14%        -18%    
+      -38%
build_new_array   1.20      6379%  16%              --         -5%    
+      -28%
offset_copy       1.14      6737%  22%              6%          --    
+      -24%
edit_in_place    0.868      8884%  60%             39%         31%    
+        --
[download]

And actually, my original benchmark was also flawed -- or at least lazy -- in as much as it conflates the time taken to build the original array into the overall timings; which is unrealistic.

Correcting for that

I get yet another set of numbers:

Which just goes to prove a) be careful what you benchmark; b) O(n²) in C can often be considerably faster than O(n) in Perl; if the former avoids the multiple opcodes of the latter.

WHere the copy-offset (or your in-place) really come into their own is when the array being filtered is close to the limits of your memory to start with.

With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'

Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.

"Science is about questioning the status quo. Questioning authority".

In the absence of evidence, opinion is indistinguishable from prejudice.

Comment on Re^3: foreach array - delete current row ? (flaws) Select or Download Code

Replies are listed 'Best First'.
Re^4: foreach array - delete current row ? (flaws) by roboticus (Chancellor) on Jun 03, 2013 at 18:05 UTC
BrowserUk: Yarch! I hate it when I mess up a benchmark. Thanks for the catch! ...roboticus When your only tool is a hammer, all problems look like your thumb.	[reply]
Re^4: foreach array - delete current row ? (flaws) by Anonymous Monk on Apr 27, 2015 at 01:55 UTC
A bit late to the game, but... I think your benchmark is still flawed because you're applying a destructive operation multiple times without re-initializing the test data in between. On the first iteration of each test case, the corresponding array contains multiple matches. After the first iteration, the matches are removed, so you're simply testing which method can remove zero matching elements the fastest. It makes sense that the splice approach will do this quickly, since splice is never actually called; while the grep approach, which needs to copy every element of the array, is not so fast.	[reply]
Re^5: foreach array - delete current row ? (flaws) by BrowserUk (Patriarch) on Apr 27, 2015 at 08:24 UTC
I think your benchmark is still flawed because you're applying a destructive operation multiple times without re-initializing the test data in between. You're correct! (Strange how we all missed that.) To correct for my error, I've re-visited the benchmark and corrected for that deficiency. (And hopefully not missed or introduced any other errors!) I've tried to make this produce output as close to benchmarks as I can without getting anal about it: #! perl -slw use strict; use Time::HiRes qw[ time ]; use Data::Dump qw[ pp ]; $Data::Dump::WIDTH = 1000; our $N //= 1e3; our $I //= 10; my @tests = qw[ forSplice grep offsetCopy buildNew editInplace ]; my %times = map{ $_ => 0 } @tests; my( $start, $end ); my @a; for( 1 .. $I ) { @a = 1 .. $N; $start = time; { $a[$_] =~ /9/ and splice @a, $_, 1 for reverse 0 .. $#a; # pp \@a; } $times{ forSplice } += time() - $start; @a = 1 .. $N; $start = time; { @a = grep !/9/, @a; # pp \@a; } $times{ grep } += time() - $start; @a = 1 .. $N; $start = time; { my $o = 0; for( 0 .. $#a ) { $a[ $_ - $o ] = $a[ $_ ]; $a[ $_ ] =~ /9/ and ++$o; } $#a = $#a - $o; # pp \@a; } $times{ offsetCopy } += time() - $start; @a = 1 .. $N; $start = time; { my @b; for( @a ) { push @b, $_ unless /9/; } # pp \@b; } $times{ buildNew } += time() - $start; @a = 1 .. $N; $start = time; { my $o = 0; for( @a ) { $a[ $o++ ] = $_ unless /9/; } $#a = $o - 1; # pp \@a; } $times{ editInplace } += time() - $start; }; $times{ $_ } /= $I for @tests; #pp \%times; @tests = sort{ $times{ $a } < $times{ $b } } @tests; print join '', map sprintf( " %12s", $_ ), '', 'rate', @tests; for my $a ( @tests ) { printf "%12s %10g/s", $a, 1/$times{ $a }; for my $b ( @tests ) { printf " %11.f%%", $times{ $b } / $times{ $a } * 100; } print ''; } __END__ C:\test>1036622 -N=1e2 rate grep forSplice buildNew of +fsetCopy editInplace grep 7687.51/s 100% 61% 76% + 75% 58% forSplice 12539/s 163% 100% 124% + 123% 95% buildNew 10082.5/s 131% 80% 100% + 99% 76% offsetCopy 10182.8/s 132% 81% 101% + 100% 77% editInplace 13206.2/s 172% 105% 131% + 130% 100% C:\test>1036622 -N=1e3 rate grep forSplice buildNew of +fsetCopy editInplace grep 760.637/s 100% 63% 78% + 75% 59% forSplice 1201.22/s 158% 100% 123% + 119% 93% buildNew 973.269/s 128% 81% 100% + 97% 75% offsetCopy 1007.76/s 132% 84% 104% + 100% 78% editInplace 1293.34/s 170% 108% 133% + 128% 100% C:\test>1036622 -N=1e4 rate grep forSplice offsetCopy +buildNew editInplace grep 72.5919/s 100% 97% 79% + 73% 54% forSplice 74.7857/s 103% 100% 81% + 75% 56% offsetCopy 92.2046/s 127% 123% 100% + 92% 69% buildNew 99.9543/s 138% 134% 108% + 100% 75% editInplace 133.395/s 184% 178% 145% + 133% 100% C:\test>1036622 -N=1e5 rate forSplice grep offsetCopy +buildNew editInplace forSplice 1.13766/s 100% 17% 13% + 13% 10% grep 6.60517/s 581% 100% 77% + 73% 59% offsetCopy 8.63102/s 759% 131% 100% + 95% 78% buildNew 9.05601/s 796% 137% 105% + 100% 82% editInplace 11.1044/s 976% 168% 129% + 123% 100% C:\test>1036622 -N=1e6 -I=1 rate forSplice grep offsetCopy +buildNew editInplace forSplice 0.0106761/s 100% 1% 1% + 1% 1% grep 0.798803/s 7482% 100% 86% + 73% 63% offsetCopy 0.929497/s 8706% 116% 100% + 85% 73% buildNew 1.09032/s 10213% 136% 117% + 100% 86% editInplace 1.26927/s 11889% 159% 137% + 116% 100% [download] The upshot is that forSplice is a little faster than grep for small arrays; but editInPlace is hands down winner for arrays of any size; and one or two orders of magnitude for large arrays. With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday' Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". I'm with torvalds on this In the absence of evidence, opinion is indistinguishable from prejudice. Agile (and TDD) debunked	[reply] [d/l]