foreach array - delete current row ?

Replies are listed 'Best First'.
Re: foreach array - delete current row ? by dave_the_m (Monsignor) on Jun 02, 2013 at 21:37 UTC
Deleting elements from an array while iterating over it is explicitly documented as 'So don't do that' in perlfunc. Maybe grep is suitable for your use case: `@array = grep { some-condition } @array;` [download] Dave.	[reply] [d/l]
Re: foreach array - delete current row ? by davido (Cardinal) on Jun 02, 2013 at 22:20 UTC
Consider three options: grep: Iterates through the haystack list and returns a new needles list comprised of elements from the haystack that meet the criteria you set forth. At the outset, this may seem expensive; it requires more memory than in-place editing, for example. But computationally it's a linear approach. `foreach` and push onto a new array: This is a very similar approach to grep, but might feel more comfortable in cases where the criteria are not trivial. Again, you're walking through the haystack and creating a new list of found needles. `foreach` and splice: Probably most memory efficient, but think of what's going on from a computational standpoint: Locate a delete line, delete it, then shift all remaining elements down one. Locate another delete line, delete it, then shift all remaining elements down one again, and so on. Worst case is deleting every line, starting with the first. In such a case, the computational complexity is a little better than quadratic. Each time you find a needle, you have to move every remaining piece of hay over one position. If you have the ability to throw memory at the problem, I would tend to favor the linear solution provided by grep or by `foreach` combined with push. Dave	[reply] [d/l] [select]
Re: foreach array - delete current row ? by BrowserUk (Patriarch) on Jun 02, 2013 at 23:41 UTC
Three methods; which is best depends upon the size of the array, but grep is never a bad choice unless you're tight on memory: #! perl -slw use strict; use Benchmark qw[ cmpthese ]; our $N //= 1e3; cmpthese -1, { for_splice => q[ my @a = 1 .. $N; $a[$_] =~ /9/ and splice @a, $_, 1 for reverse 0 .. $#a; ], grep => q[ my @a = 1 .. $N; @a = grep !/9/, @a; ], offset_copy => q[ my @a = 1 .. $N; my $o = 0; for( 0 .. $#a ) { $a[$_-$o] = $a[$_]; $a[ $_ ] =~ /9/ and ++$o; } $#a = $#a - $o; ], }; __END__ C:\test>1036622 -N=1e2 Rate grep offset_copy for_splice grep 8144/s -- -14% -19% offset_copy 9489/s 17% -- -5% for_splice 10035/s 23% 6% -- C:\test>1036622 -N=1e3 Rate grep offset_copy for_splice grep 732/s -- -19% -44% offset_copy 898/s 23% -- -31% for_splice 1304/s 78% 45% -- C:\test>1036622 -N=1e4 Rate for_splice grep offset_copy for_splice 64.9/s -- -9% -33% grep 71.6/s 10% -- -27% offset_copy 97.5/s 50% 36% -- C:\test>1036622 -N=1e5 (warning: too few iterations for a reliable count) Rate for_splice grep offset_copy for_splice 1.56/s -- -78% -80% grep 7.00/s 348% -- -13% offset_copy 8.00/s 412% 14% -- C:\test>1036622 -N=1e6 (warning: too few iterations for a reliable count) (warning: too few iterations for a reliable count) (warning: too few iterations for a reliable count) s/iter for_splice grep offset_copy for_splice 71.8 -- -98% -98% grep 1.36 5183% -- -20% offset_copy 1.09 6467% 24% -- [download] With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday' Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice.	[reply] [d/l]
Re^2: foreach array - delete current row ? by roboticus (Chancellor) on Jun 03, 2013 at 10:50 UTC
Update 3: As BrowserUk notes below, my code is flawed. Since he already corrected it, I'll leave my incorrect code here. For future benchmarking, I think I'll have to try and remember to put in a test case to compare the results of all the routines to ensure that they match. I've done that sometimes in the past when I wasn't sure my code was correct. However it seems that I should always do it, since I was sure that this code was correct. <sigh> BrowserUk: Generally, I just build a new array because it's easiest. I didn't realize that it's reasonably fast, too. `build_new_array=>q[ my @a = 1 .. $N; my @b; for (@a) { push @b, $_ if /9/; } ],` [download] I'm surprised at how much the for_splice version moves around in the rankings. While waiting for it to finish, I realized that your offset copy method would be faster if you saved the bookkeeping for the end, so I tweaked it a little: `edit_in_place=>q[ my @a = 1 .. $N; my $o = 0; for (@a) { $a[$o++]=$_ if /9/; } $#a=$o-1; ],` [download] $ for J in 2 3 4 5 6; do perl 1036631.pl -N=1e$J; done Rate offset_copy grep for_splice build_new edi +t_in_place offset_copy 3288/s -- -10% -27% -37% + -38% grep 3657/s 11% -- -18% -30% + -31% for_splice 4483/s 36% 23% -- -14% + -16% build_new 5191/s 58% 42% 16% -- + -3% edit_in_place 5338/s 62% 46% 19% 3% + -- Rate offset_copy grep for_splice build_new edi +t_in_place offset_copy 319/s -- -14% -22% -37% + -40% grep 369/s 16% -- -10% -27% + -30% for_splice 411/s 29% 11% -- -19% + -22% build_new 505/s 58% 37% 23% -- + -5% edit_in_place 528/s 66% 43% 29% 5% + -- Rate for_splice offset_copy grep build_new edi +t_in_place for_splice 28.6/s -- -3% -8% -35% + -42% offset_copy 29.4/s 3% -- -5% -33% + -40% grep 31.1/s 9% 6% -- -29% + -37% build_new 43.8/s 53% 49% 41% -- + -11% edit_in_place 49.1/s 72% 67% 58% 12% + -- (warning: too few iterations for a reliable count) (warning: too few iterations for a reliable count) Rate for_splice offset_copy grep build_new edi +t_in_place for_splice 0.433/s -- -86% -87% -90% + -90% offset_copy 3.00/s 593% -- -11% -28% + -34% grep 3.36/s 676% 12% -- -19% + -26% build_new 4.17/s 862% 39% 24% -- + -8% edit_in_place 4.55/s 950% 52% 35% 9% + -- (warning: too few iterations for a reliable count) (warning: too few iterations for a reliable count) (warning: too few iterations for a reliable count) (warning: too few iterations for a reliable count) (warning: too few iterations for a reliable count) s/iter for_splice offset_copy grep build_new edi +t_in_place for_splice 632 -- -99% -100% -100% + -100% offset_copy 3.38 18587% -- -18% -28% + -36% grep 2.76 22785% 22% -- -12% + -22% build_new 2.43 25893% 39% 14% -- + -11% edit_in_place 2.16 29142% 56% 28% 13% + -- [download] ... will this ever finish? If it does before I go to work, I'll update the node with the result. Update: Added the edit in place chunk, and replaced the output to include those results as well. Update 2: The 1e6 case finally finished, so I added them. If anyone cares, this is on an Intel Atom 330 1.6GHz machine with 2GB RAM. Additionally, I slightly formatted the output (removing a few blank lines, and inserting some in other places) for readability. ...roboticus When your only tool is a hammer, all problems look like your thumb.	[reply] [d/l] [select]
Re^3: foreach array - delete current row ? (flaws) by BrowserUk (Patriarch) on Jun 03, 2013 at 13:44 UTC
There is a problem with your two routines that biases the benchmark in their favour. My routines remove any value containing a 9; yours only keep those containing a 9. Once you correct that (unless instead of if), you get a different set of numbers: C:\test>1036622 -N=1e3 Rate grep build_new_array offset_copy for_splice edi +t_in_place grep 678/s -- -22% -27% -34% + -41% build_new_array 870/s 28% -- -6% -15% + -24% offset_copy 929/s 37% 7% -- -9% + -19% for_splice 1024/s 51% 18% 10% -- + -11% edit_in_place 1150/s 70% 32% 24% 12% + -- C:\test>1036622 -N=1e4 Rate for_splice grep offset_copy build_new_array edi +t_in_place for_splice 71.5/s -- -9% -11% -22% + -39% grep 78.7/s 10% -- -2% -14% + -33% offset_copy 80.6/s 13% 2% -- -12% + -31% build_new_array 91.3/s 28% 16% 13% -- + -22% edit_in_place 117/s 64% 49% 46% 29% + -- C:\test>1036622 -N=1e5 Rate for_splice grep build_new_array offset_copy edi +t_in_place for_splice 1.29/s -- -75% -83% -84% + -85% grep 5.12/s 296% -- -33% -35% + -42% build_new_array 7.68/s 494% 50% -- -3% + -14% offset_copy 7.88/s 509% 54% 3% -- + -11% edit_in_place 8.89/s 587% 74% 16% 13% + -- C:\test>1036622 -N=1e6 s/iter for_splice grep build_new_array offset_copy edi +t_in_place for_splice 77.9 -- -98% -98% -99% + -99% grep 1.39 5503% -- -14% -18% + -38% build_new_array 1.20 6379% 16% -- -5% + -28% offset_copy 1.14 6737% 22% 6% -- + -24% edit_in_place 0.868 8884% 60% 39% 31% + -- [download] And actually, my original benchmark was also flawed -- or at least lazy -- in as much as it conflates the time taken to build the original array into the overall timings; which is unrealistic. Correcting for that : #! perl -slw use strict; use Benchmark qw[ cmpthese ]; our $N //= 1e3; our $I //= -1; our @as; @{ $as[ $_ ] } = 1 .. $N for 0 .. 4; cmpthese $I, { for_splice => q[ my $ar = $as[0]; $ar->[$_] =~ /9/ and splice @$ar, $_, 1 for reverse 0 .. $#$ar +; $I == 1 and print "0: ", "@$ar"; ], grep => q[ my $ar = $as[1]; @$ar = grep !/9/, @$ar; $I == 1 and print "1: ", "@$ar"; ], offset_copy => q[ my $ar = $as[2]; my $o = 0; for( 0 .. $#$ar ) { $ar->[$_-$o] = $ar->[$_]; $ar->[ $_ ] =~ /9/ and ++$o; } $#$ar = $#$ar - $o; $I == 1 and print "2: ", "@$ar"; ], build_new_array=>q[ my $ar = $as[3]; my @b; for( @$ar ) { push @b, $_ unless /9/; } $I == 1 and print "3: ", "@b"; ], edit_in_place=>q[ my $ar = $as[4]; my $o = 0; for( @$ar ) { $ar->[$o++] = $_ unless /9/; } $#$ar = $o - 1; $I == 1 and print "4: ", "@$ar"; ], }; [download] I get yet another set of numbers: C:\test>1036622 -N=1e3 Rate grep build_new_array offset_copy edit_in_place +for_splice grep 1464/s -- -20% -43% -61% + -65% build_new_array 1820/s 24% -- -29% -52% + -56% offset_copy 2578/s 76% 42% -- -32% + -38% edit_in_place 3803/s 160% 109% 48% -- + -8% for_splice 4144/s 183% 128% 61% 9% + -- C:\test>1036622 -N=1e4 (warning: too few iterations for a reliable count) Rate grep build_new_array offset_copy edit_in_place +for_splice grep 159/s -- -25% -44% -62% + -66% build_new_array 212/s 33% -- -26% -49% + -54% offset_copy 287/s 80% 36% -- -31% + -38% edit_in_place 419/s 163% 98% 46% -- + -10% for_splice 464/s 191% 119% 62% 11% + -- C:\test>1036622 -N=1e5 Rate grep build_new_array offset_copy for_splice edi +t_in_place grep 15.4/s -- -22% -50% -69% + -70% build_new_array 19.8/s 28% -- -36% -60% + -62% offset_copy 31.0/s 102% 57% -- -37% + -40% for_splice 49.6/s 222% 151% 60% -- + -4% edit_in_place 51.6/s 235% 161% 66% 4% + -- C:\test>1036622 -N=1e6 Rate grep build_new_array offset_copy edit_in_place +for_splice grep 1.60/s -- -23% -51% -70% + -71% build_new_array 2.06/s 29% -- -37% -61% + -62% offset_copy 3.28/s 105% 59% -- -38% + -40% edit_in_place 5.26/s 229% 155% 60% -- + -4% for_splice 5.49/s 243% 166% 67% 4% + -- [download] Which just goes to prove a) be careful what you benchmark; b) O(n²) in C can often be considerably faster than O(n) in Perl; if the former avoids the multiple opcodes of the latter. WHere the copy-offset (or your in-place) really come into their own is when the array being filtered is close to the limits of your memory to start with. With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday' Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice.	[reply] [d/l] [select]
Re^4: foreach array - delete current row ? (flaws) by roboticus (Chancellor) on Jun 03, 2013 at 18:05 UTC
Re^4: foreach array - delete current row ? (flaws) by Anonymous Monk on Apr 27, 2015 at 01:55 UTC
Re^5: foreach array - delete current row ? (flaws) by BrowserUk (Patriarch) on Apr 27, 2015 at 08:24 UTC
Re: foreach array - delete current row ? by Laurent_R (Canon) on Jun 02, 2013 at 22:36 UTC
Deleting elements from an array while iterating over it is explicitly documented as 'So don't do that' in perlfunc. Yes, this is really getting bad and messy. But for some reason, it seems that it sort of works fine with a hash. I had a program where I was progressively "improving" my hash by deleting useless elements, and everything went fine. Then I thought that it might be more efficient to use an array instead, and I figured out that it did not work at all. The array got completely messed up. In brief, it seems that you can delete records from a hash while processing it, but can"t do it in an array. OK, maybe I got it wrong and did not understand fully what was going on, but I got the feeling that you can modify a hash while processing it with the keys function, but not an array fith the foreach function. Does any monk here have an explanation for this apparently different behavor?	[reply]
Re^2: foreach array - delete current row ? by davido (Cardinal) on Jun 02, 2013 at 22:43 UTC
As documented in each, it is safe to use the each iterator to iterate over a hash and delete the element most recently returned. It is also safe to generate a list of keys, and then delete hash elements by iterating over those keys -- you're not deleting elements from the list returned by keys, you're deleting elements from the hash that were listed by keys; a subtle but important difference. With an array, deleting elements requires rearranging the array, which (as the C++ folks like to say) invalidates the iterator, or as you mentioned, the array gets "completely messed up." Dave	[reply]
Re: foreach array - delete current row ? by vsespb (Chaplain) on Jun 02, 2013 at 21:46 UTC
delete the current row from an array when looping through it without setting up a counter and splice operation IHMO counter+splice is most feasible way. I use it when copying arrays with grep is too slow.	[reply]
Re: foreach array - delete current row ? by locked_user sundialsvc4 (Abbot) on Jun 03, 2013 at 02:11 UTC
Instead of deleting the items that you want to discard from an array that you are iterating over ... which is a very dicey proposition in any programming language ... just create a new list and `push` the entries that you want to keep onto it. (The `grep` function might do this in one fell swoop.) Then, replace the original list with the new one. `@list = grep( &keep_me, @list);` In almost every case, the two lists consist of references to some bit of information ... and a reference, no matter what monstrosity it may refer to, is small and cheap. You’re actually not moving the contents (big) around, just references-to (small) them. As you build the new list, the reference-counts of all its contents briefly rises to “2,” then, when the new list replaces the old one (and the old one vanishes into the gloom of the garbage-collector ...) they all go down again.
Re^2: foreach array - delete current row ? by AnomalousMonk (Archbishop) on Jun 03, 2013 at 07:01 UTC
`@list = grep( &keep_me, @list);` [download] I would add that the `keep_me()` function of the example must operate on the `$_` default scalar, which is topicalized by grep. `>perl -wMstrict -le "sub keep_me { return m{ [[:upper:]] }xms; } ;; my @list = qw(a b K d E f g h E P m e); @list = grep &keep_me, @list; print qq{@list}; " K E E P` [download]	[reply] [d/l] [select]

@array = grep { some-condition } @array;
[download]

Dave.

[reply]
[d/l]

Consider three options:

grep: Iterates through the haystack list and returns a new needles list comprised of elements from the haystack that meet the criteria you set forth. At the outset, this may seem expensive; it requires more memory than in-place editing, for example. But computationally it's a linear approach.
foreach and push onto a new array: This is a very similar approach to grep, but might feel more comfortable in cases where the criteria are not trivial. Again, you're walking through the haystack and creating a new list of found needles.
foreach and splice: Probably most memory efficient, but think of what's going on from a computational standpoint: Locate a delete line, delete it, then shift all remaining elements down one. Locate another delete line, delete it, then shift all remaining elements down one again, and so on. Worst case is deleting every line, starting with the first. In such a case, the computational complexity is a little better than quadratic. Each time you find a needle, you have to move every remaining piece of hay over one position.

If you have the ability to throw memory at the problem, I would tend to favor the linear solution provided by grep or by foreach combined with push.

Dave

[reply]
[d/l]
[select]

Three methods; which is best depends upon the size of the array, but grep is never a bad choice unless you're tight on memory:

#! perl -slw
use strict;
use Benchmark qw[ cmpthese ];

our $N //= 1e3;

cmpthese -1, {
    for_splice => q[
        my @a = 1 .. $N;
        $a[$_] =~ /9/ and splice @a, $_, 1 for reverse 0 .. $#a;
    ],
    grep => q[
        my @a = 1 .. $N;
        @a = grep !/9/, @a;
    ],
    offset_copy => q[
        my @a = 1 .. $N;
        my $o = 0;
        for( 0 .. $#a ) {
            $a[$_-$o] = $a[$_];
            $a[ $_ ] =~ /9/ and ++$o;
        }
        $#a = $#a - $o;
    ],
};
__END__
C:\test>1036622 -N=1e2
               Rate       grep offset_copy for_splice
grep         8144/s         --        -14%       -19%
offset_copy  9489/s        17%          --        -5%
for_splice  10035/s        23%          6%         --

C:\test>1036622 -N=1e3
              Rate       grep offset_copy for_splice
grep         732/s         --        -19%       -44%
offset_copy  898/s        23%          --       -31%
for_splice  1304/s        78%         45%         --

C:\test>1036622 -N=1e4
              Rate for_splice       grep offset_copy
for_splice  64.9/s         --        -9%        -33%
grep        71.6/s        10%         --        -27%
offset_copy 97.5/s        50%        36%          --

C:\test>1036622 -N=1e5
            (warning: too few iterations for a reliable count)
              Rate for_splice       grep offset_copy
for_splice  1.56/s         --       -78%        -80%
grep        7.00/s       348%         --        -13%
offset_copy 8.00/s       412%        14%          --

C:\test>1036622 -N=1e6
            (warning: too few iterations for a reliable count)
            (warning: too few iterations for a reliable count)
            (warning: too few iterations for a reliable count)
           s/iter for_splice       grep offset_copy
for_splice   71.8         --       -98%        -98%
grep         1.36      5183%         --        -20%
offset_copy  1.09      6467%        24%          --
[download]

With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'

Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.

"Science is about questioning the status quo. Questioning authority".

In the absence of evidence, opinion is indistinguishable from prejudice.

[reply]
[d/l]

Update 3: As BrowserUk notes below, my code is flawed. Since he already corrected it, I'll leave my incorrect code here. For future benchmarking, I think I'll have to try and remember to put in a test case to compare the results of all the routines to ensure that they match. I've done that sometimes in the past when I wasn't sure my code was correct. However it seems that I should *always* do it, since I was sure that *this* code was correct. <sigh>

BrowserUk:

Generally, I just build a new array because it's easiest. I didn't realize that it's reasonably fast, too.

    build_new_array=>q[
        my @a = 1 .. $N;
        my @b;
        for (@a) {
            push @b, $_ if /9/;
        }
    ],
[download]

I'm surprised at how much the for_splice version moves around in the rankings.

While waiting for it to finish, I realized that your offset copy method would be faster if you saved the bookkeeping for the end, so I tweaked it a little:

    edit_in_place=>q[
        my @a = 1 .. $N;
        my $o = 0;
        for (@a) {
            $a[$o++]=$_ if /9/;
        }
        $#a=$o-1;
    ],
[download]

$ for J in 2 3 4 5 6; do perl 1036631.pl -N=1e$J; done

                Rate offset_copy        grep for_splice  build_new edi
+t_in_place
offset_copy   3288/s          --        -10%       -27%       -37%    
+      -38%
grep          3657/s         11%          --       -18%       -30%    
+      -31%
for_splice    4483/s         36%         23%         --       -14%    
+      -16%
build_new     5191/s         58%         42%        16%         --    
+       -3%
edit_in_place 5338/s         62%         46%        19%         3%    
+        --

               Rate offset_copy        grep for_splice   build_new edi
+t_in_place
offset_copy   319/s          --        -14%       -22%        -37%    
+      -40%
grep          369/s         16%          --       -10%        -27%    
+      -30%
for_splice    411/s         29%         11%         --        -19%    
+      -22%
build_new     505/s         58%         37%        23%          --    
+       -5%
edit_in_place 528/s         66%         43%        29%          5%    
+        --

                Rate for_splice offset_copy        grep  build_new edi
+t_in_place
for_splice    28.6/s         --         -3%         -8%       -35%    
+      -42%
offset_copy   29.4/s         3%          --         -5%       -33%    
+      -40%
grep          31.1/s         9%          6%          --       -29%    
+      -37%
build_new     43.8/s        53%         49%         41%         --    
+      -11%
edit_in_place 49.1/s        72%         67%         58%        12%    
+        --

            (warning: too few iterations for a reliable count)
            (warning: too few iterations for a reliable count)
                 Rate for_splice offset_copy       grep  build_new edi
+t_in_place
for_splice    0.433/s         --        -86%       -87%       -90%    
+      -90%
offset_copy    3.00/s       593%          --       -11%       -28%    
+      -34%
grep           3.36/s       676%         12%         --       -19%    
+      -26%
build_new      4.17/s       862%         39%        24%         --    
+       -8%
edit_in_place  4.55/s       950%         52%        35%         9%    
+        --

            (warning: too few iterations for a reliable count)
            (warning: too few iterations for a reliable count)
            (warning: too few iterations for a reliable count)
            (warning: too few iterations for a reliable count)
            (warning: too few iterations for a reliable count)
              s/iter for_splice offset_copy        grep  build_new edi
+t_in_place
for_splice       632         --        -99%       -100%      -100%    
+     -100%
offset_copy     3.38     18587%          --        -18%       -28%    
+      -36%
grep            2.76     22785%         22%          --       -12%    
+      -22%
build_new       2.43     25893%         39%         14%         --    
+      -11%
edit_in_place   2.16     29142%         56%         28%        13%    
+        --
[download]

... will this *ever* finish? If it does before I go to work, I'll update the node with the result.

Update: Added the edit in place chunk, and replaced the output to include those results as well.

Update 2: The 1e6 case finally finished, so I added them. If anyone cares, this is on an Intel Atom 330 1.6GHz machine with 2GB RAM. Additionally, I slightly formatted the output (removing a few blank lines, and inserting some in other places) for readability.

...roboticus

When your only tool is a hammer, all problems look like your thumb.

[reply]
[d/l]
[select]

There is a problem with your two routines that biases the benchmark in their favour.

My routines remove any value containing a 9; yours only keep those containing a 9.

Once you correct that (unless instead of if), you get a different set of numbers:

C:\test>1036622 -N=1e3
                  Rate grep build_new_array offset_copy for_splice edi
+t_in_place
grep             678/s   --            -22%        -27%       -34%    
+      -41%
build_new_array  870/s  28%              --         -6%       -15%    
+      -24%
offset_copy      929/s  37%              7%          --        -9%    
+      -19%
for_splice      1024/s  51%             18%         10%         --    
+      -11%
edit_in_place   1150/s  70%             32%         24%        12%    
+        --

C:\test>1036622 -N=1e4
                  Rate for_splice grep offset_copy build_new_array edi
+t_in_place
for_splice      71.5/s         --  -9%        -11%            -22%    
+      -39%
grep            78.7/s        10%   --         -2%            -14%    
+      -33%
offset_copy     80.6/s        13%   2%          --            -12%    
+      -31%
build_new_array 91.3/s        28%  16%         13%              --    
+      -22%
edit_in_place    117/s        64%  49%         46%             29%    
+        --

C:\test>1036622 -N=1e5
                  Rate for_splice grep build_new_array offset_copy edi
+t_in_place
for_splice      1.29/s         -- -75%            -83%        -84%    
+      -85%
grep            5.12/s       296%   --            -33%        -35%    
+      -42%
build_new_array 7.68/s       494%  50%              --         -3%    
+      -14%
offset_copy     7.88/s       509%  54%              3%          --    
+      -11%
edit_in_place   8.89/s       587%  74%             16%         13%    
+        --

C:\test>1036622 -N=1e6
                s/iter for_splice grep build_new_array offset_copy edi
+t_in_place
for_splice        77.9         -- -98%            -98%        -99%    
+      -99%
grep              1.39      5503%   --            -14%        -18%    
+      -38%
build_new_array   1.20      6379%  16%              --         -5%    
+      -28%
offset_copy       1.14      6737%  22%              6%          --    
+      -24%
edit_in_place    0.868      8884%  60%             39%         31%    
+        --
[download]

And actually, my original benchmark was also flawed -- or at least lazy -- in as much as it conflates the time taken to build the original array into the overall timings; which is unrealistic.

Correcting for that

:
#! perl -slw
use strict;
use Benchmark qw[ cmpthese ];

our $N //= 1e3;
our $I //= -1;

our @as;
@{ $as[ $_ ] } = 1 .. $N for 0 .. 4;

cmpthese $I, {
    for_splice => q[
        my $ar = $as[0];
        $ar->[$_] =~ /9/ and splice @$ar, $_, 1 for reverse 0 .. $#$ar
+;
        $I == 1 and print "0: ", "@$ar";
    ],
    grep => q[
        my $ar = $as[1];
        @$ar = grep !/9/, @$ar;
        $I == 1 and print "1: ", "@$ar";
    ],
    offset_copy => q[
        my $ar = $as[2];
        my $o = 0;
        for( 0 .. $#$ar ) {
            $ar->[$_-$o] = $ar->[$_];
            $ar->[ $_ ] =~ /9/ and ++$o;
        }
        $#$ar = $#$ar - $o;
        $I == 1 and print "2: ", "@$ar";
    ],
    build_new_array=>q[
        my $ar = $as[3];
        my @b;
        for( @$ar ) {
            push @b, $_ unless /9/;
        }
        $I == 1 and print "3: ", "@b";
    ],
    edit_in_place=>q[
        my $ar = $as[4];
        my $o = 0;
        for( @$ar ) {
            $ar->[$o++] = $_ unless /9/;
        }
        $#$ar = $o - 1;
        $I == 1 and print "4: ", "@$ar";
    ],
};
[download]

I get yet another set of numbers:

C:\test>1036622 -N=1e3
                  Rate grep build_new_array offset_copy edit_in_place 
+for_splice
grep            1464/s   --            -20%        -43%          -61% 
+      -65%
build_new_array 1820/s  24%              --        -29%          -52% 
+      -56%
offset_copy     2578/s  76%             42%          --          -32% 
+      -38%
edit_in_place   3803/s 160%            109%         48%            -- 
+       -8%
for_splice      4144/s 183%            128%         61%            9% 
+        --

C:\test>1036622 -N=1e4
            (warning: too few iterations for a reliable count)

                 Rate  grep build_new_array offset_copy edit_in_place 
+for_splice
grep            159/s    --            -25%        -44%          -62% 
+      -66%
build_new_array 212/s   33%              --        -26%          -49% 
+      -54%
offset_copy     287/s   80%             36%          --          -31% 
+      -38%
edit_in_place   419/s  163%             98%         46%            -- 
+      -10%
for_splice      464/s  191%            119%         62%           11% 
+        --

C:\test>1036622 -N=1e5
                  Rate grep build_new_array offset_copy for_splice edi
+t_in_place
grep            15.4/s   --            -22%        -50%       -69%    
+      -70%
build_new_array 19.8/s  28%              --        -36%       -60%    
+      -62%
offset_copy     31.0/s 102%             57%          --       -37%    
+      -40%
for_splice      49.6/s 222%            151%         60%         --    
+       -4%
edit_in_place   51.6/s 235%            161%         66%         4%    
+        --

C:\test>1036622 -N=1e6
                  Rate grep build_new_array offset_copy edit_in_place 
+for_splice
grep            1.60/s   --            -23%        -51%          -70% 
+      -71%
build_new_array 2.06/s  29%              --        -37%          -61% 
+      -62%
offset_copy     3.28/s 105%             59%          --          -38% 
+      -40%
edit_in_place   5.26/s 229%            155%         60%            -- 
+       -4%
for_splice      5.49/s 243%            166%         67%            4% 
+        --
[download]

Which just goes to prove a) be careful what you benchmark; b) O(n²) in C can often be considerably faster than O(n) in Perl; if the former avoids the multiple opcodes of the latter.

WHere the copy-offset (or your in-place) really come into their own is when the array being filtered is close to the limits of your memory to start with.

With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'

Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.

"Science is about questioning the status quo. Questioning authority".

In the absence of evidence, opinion is indistinguishable from prejudice.

[reply]
[d/l]
[select]

Deleting elements from an array while iterating over it is explicitly documented as 'So don't do that' in perlfunc.

Yes, this is really getting bad and messy. But for some reason, it seems that it sort of works fine with a hash. I had a program where I was progressively "improving" my hash by deleting useless elements, and everything went fine. Then I thought that it might be more efficient to use an array instead, and I figured out that it did not work at all. The array got completely messed up. In brief, it seems that you can delete records from a hash while processing it, but can"t do it in an array. OK, maybe I got it wrong and did not understand fully what was going on, but I got the feeling that you can modify a hash while processing it with the keys function, but not an array fith the foreach function.

Does any monk here have an explanation for this apparently different behavor?

[reply]

As documented in each, it is safe to use the each iterator to iterate over a hash and delete the element most recently returned. It is also safe to generate a list of keys, and then delete hash elements by iterating over those keys -- you're not deleting elements from the list returned by keys, you're deleting elements from the hash that were listed by keys; a subtle but important difference.

With an array, deleting elements requires rearranging the array, which (as the C++ folks like to say) invalidates the iterator, or as you mentioned, the array gets "completely messed up."

Dave

[reply]

delete the current row from an array when looping through it without setting up a counter and splice operation

[reply]

Instead of deleting the items that you want to discard from an array that you are iterating over ... which is a very dicey proposition in any programming language ... just create a new list and push the entries that you want to keep onto it. (The grep function might do this in one fell swoop.) Then, replace the original list with the new one.

@list = grep( &keep_me, @list);

In almost every case, the two lists consist of references to some bit of information ... and a reference, no matter what monstrosity it may refer to, is small and cheap. You’re actually not moving the contents (big) around, just references-to (small) them. As you build the new list, the reference-counts of all its contents briefly rises to “2,” then, when the new list replaces the old one (and the old one vanishes into the gloom of the garbage-collector ...) they all go down again.

@list = grep( &keep_me, @list);
[download]

I would add that the keep_me() function of the example must operate on the $_ default scalar, which is topicalized by grep.

>perl -wMstrict -le
"sub keep_me { return m{ [[:upper:]] }xms; }
 ;;
 my @list = qw(a b K d E f g h E P m e);
 @list = grep &keep_me, @list;
 print qq{@list};
"
K E E P
[download]

[reply]
[d/l]
[select]