in reply to Re: Removing certain elements from an array
in thread Removing certain elements from an array

There is a memory tickling the back of my brain that disagrees with you. As I understand/remember it, perl implements arrays as linked lists. splice'ing nodes should not "shift the whole end of the array". This vague memory says perl's lists were explicitly designed to make this kind of stuff work well.

My opinion of Benchmark is very low - the results I generated seems to support this vauge memory of mine.

#!/os/dist/bin/perl -w use Benchmark; my @array = 0..10000; my @idxsToRemove = (); push @idxsToRemove, int( rand( 10000 ) ) for ( 0 .. 100 ); my %toRemoveIdx; @toRemoveIdx{@idxsToRemove} = (1)x@idxsToRemove; sub Tye { my @subarray = @array; @subarray = map { $toRemoveIdx{$_} ? () : $subarray[$_] } 0..$#sub +array; } sub btrott { my @subarray = @array; splice @subarray, $_, 1 for sort { $b <=> $a } @idxsToRemove; } imethese( shift @ARGV || -5, { 'Tye' => \&Tye, 'btrott' => \&btrott, } );

Generated these results:

Benchmark: running Tye, btrott, mikfire, each for at least 5 CPU seco +nds... Tye: 7 wallclock secs ( 5.98 usr + 0.00 sys = 5.98 CPU) @ 7 +.19/s (n=43) btrott: 6 wallclock secs ( 5.38 usr + 0.00 sys = 5.38 CPU) @ 63 +.01/s (n=339)

mikfire

Replies are listed 'Best First'.
RE: RE: Re: Removing certain elements from an array
by tye (Sage) on Aug 24, 2000 at 22:16 UTC

    Note that Perl arrays are most definitely not linked lists. They are neat, however.

    What you may be remembering is that pulling things off of the front of an array is nearly as efficient as popping them off of the back of an array. This is because Perl doesn't bother to shift all of the elements down until the gap in front gets big enough. But removing items from the middle of an array requires that one end (or the other?) be shifted.

    The drastic difference in benchmark times confuses me. Update: No, "sub btrott" isn't broken (my debugging was). Hmmm....

    Update 2: Never underestimate the slowness of Perl's opcode machinery. The "map" version loses because it executes 10_000 statements while the "splice" version only executes 100. While each "splice" statement has to copy a big chunk of the 10_000-element array, Perl's overhead swamps that. With the number of elements to be removed raised to 5_000, the "map" version is faster.

            - tye (but my friends call me "Tye")
RE: RE: Re: Removing certain elements from an array
by lhoward (Vicar) on Aug 24, 2000 at 22:18 UTC
RE: RE: Re: Removing certain elements from an array
by merlyn (Sage) on Aug 24, 2000 at 22:15 UTC
    Nope. Arrays are not "linked lists". They do not "splice" well, but they do "shift" and "pop" quickly. And they can "unshift" and "push" quickly if the spare (or leftover) memory happens to be in the right place, otherwise, they must be copied.

    -- Randal L. Schwartz, Perl hacker

      That will teach me to trust my memory :)

      mikfire

RE (tilly) 3: Removing certain elements from an array
by tilly (Archbishop) on Aug 25, 2000 at 00:17 UTC
    Your post illustrates the need to benchmark code carefully! The map solutions are better algorithmically, meaning that as you increase the size of the array and the number of elements removed, btrott's solution will slow down much more than they do. But in the real world the worse algorithm may still work out to be better.

    In this example, clearly that is the case. If you want to play around with it, here is some sample code to mess around with:

    use Benchmark; my $n = 10000; my $m = 100; my @array = 0..$n; my @idxsToRemove = map {int( rand( 10000 ) ) } 1..$m; my %tests = ( grepping => sub { my @subarray = @array; my %rem; @toRemoveIdx{@idxsToRemove} = (1)x@idxsToRemove; @subarray = @subarray[ grep !$toRemoveIdx{$_}, 0..$#subarray ]; }, mapping => sub { my @subarray = @array; my %rem; @toRemoveIdx{@idxsToRemove} = (1)x@idxsToRemove; @subarray = map { $toRemoveIdx{$_} ? () : $subarray[$_] } 0..$#s +ubarray; }, sorting => sub { my @subarray = @array; splice @subarray, $_, 1 for sort { $b <=> $a } @idxsToRemove; @subarray; }, ); timethese( shift @ARGV || -5, \%tests );
    Play with it. As you move $m and $n up you will find the first two solutions scaling better. But unless you have some truly impressive numbers, the sort solution by btrott will be faster.

    (Yes, I know the test could be improved a lot...)

    NOTE: The sorting solution is both faster and incorrect. If your list of elements to remove has duplicates, you incorrectly remove the same element multiple times! This illustrates my real reason for liking map, I find it easier for me to figure out and validate all possible gotchas using it. Even if it is sometimes a lot slower. :-)

    EDIT
    A typo change, noticed that I changed code without touching the description of what the code did. Oops.