Beefy Boxes and Bandwidth Generously Provided by pair Networks
Keep It Simple, Stupid
 
PerlMonks  

Removing certain elements from an array

by Anonymous Monk
on Aug 24, 2000 at 20:45 UTC ( [id://29451]=perlquestion: print w/replies, xml ) Need Help??

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Say I have an array @array and a list of indexes to remove from the list @remindexes. What is the best way to make the elements at these locations disappear and have the rest of the array elements "scrunch" together?
  • Comment on Removing certain elements from an array

Replies are listed 'Best First'.
Re: Removing certain elements from an array
by btrott (Parson) on Aug 24, 2000 at 20:53 UTC
    I think this'll do it for you:
    splice @array, $_, 1 for sort { $b <=> $a } @indices;
    The reason for the sort is that it starts dropping elements from the end of the array. If you start from the beginning, you won't drop the elements that you want to drop. Say you want to drop the elements at indices 0 and 2; you drop the element at index 0, but then the other elements squish towards the front of the array. So element 2 is now element 1. So you'll drop the wrong elements. That's why you should start from the end.

    Update: tilly pointed out a bug where duplicate indices could lead to dropping incorrect elements in the array. Here's a fix:

    my %u; splice @array, $_, 1 for grep !$u{$_}++, sort { $b <=> $a } @indices;
    This is just getting silly in terms of the number of times we're looping over the data; but it does still seem to be quite fast. Faster than I would have expected, in fact.
      Cool, that was the problem I was having. I knew about the wonders of splice but just didn't think of using sort to prevent the indexes from becoming invalidated after doing the first splice.

      Thanks a lot!

RE (tilly) 1: Removing certain elements from an array
by tilly (Archbishop) on Aug 24, 2000 at 21:30 UTC
    TMTOWTDI, here is a map solution:
    my %remove; ++$remove{$_} foreach @remindexes; @array = map ((exists $remove{$_} ? () : $array[$_]), 0..$#array);
    Basically build up a hash lookup for what you don't want, and map the indexes of the array into nothing or the array element as desired.

    BTW recent conversations with tye have reminded me that entering and leaving blocks has a significant overhead. Therefore I have not written this map with a block.

RE: Removing certain elements from an array
by Boogman (Scribe) on Aug 24, 2000 at 20:50 UTC
    Use the splice function. From perlfunc:
    splice ARRAY,OFFSET,LENGTH,LIST Removes the elements designated by OFFSET and LENGTH from an array, and replaces them with the elements of LIST, if any.
    If you don't specify LIST, it just removes the specified elements from the array and "scrunches" the rest of the array together.

    Update: Oh and if you want have a list of them you wanted to remove, you would probably want to do something like splice( @array, $_, 1 ) foreach ( @elementsToDelete ); And @elementsToDelete would have to be sorted in descending order as btrott mentions.
Re: Removing certain elements from an array
by tye (Sage) on Aug 24, 2000 at 21:31 UTC
    my @array; getreallyhugelist( \@array ); my @idxsToRemove= getnotashugelist(); my %toRemoveIdx; @toRemoveIdx{@idxsToRemove}= (1)x@idxsToRemove; @array= map { $toRemoveIdx{$_} ? () : $array[$_] } 0..$#array;

    That last line copies the whole array once (puts it on the stack) and then only has to shift each non-removed element once. So it will probably be faster than repeated calls to splice() (each one having to shift the whole end of the array). I think this speed-up will more than compensate for the cost of building the hash (on the preceding line) if you are removing more than a few elements.

            - tye (but my friends call me "Tye")
      There is a memory tickling the back of my brain that disagrees with you. As I understand/remember it, perl implements arrays as linked lists. splice'ing nodes should not "shift the whole end of the array". This vague memory says perl's lists were explicitly designed to make this kind of stuff work well.

      My opinion of Benchmark is very low - the results I generated seems to support this vauge memory of mine.

      #!/os/dist/bin/perl -w use Benchmark; my @array = 0..10000; my @idxsToRemove = (); push @idxsToRemove, int( rand( 10000 ) ) for ( 0 .. 100 ); my %toRemoveIdx; @toRemoveIdx{@idxsToRemove} = (1)x@idxsToRemove; sub Tye { my @subarray = @array; @subarray = map { $toRemoveIdx{$_} ? () : $subarray[$_] } 0..$#sub +array; } sub btrott { my @subarray = @array; splice @subarray, $_, 1 for sort { $b <=> $a } @idxsToRemove; } imethese( shift @ARGV || -5, { 'Tye' => \&Tye, 'btrott' => \&btrott, } );

      Generated these results:

      Benchmark: running Tye, btrott, mikfire, each for at least 5 CPU seco +nds... Tye: 7 wallclock secs ( 5.98 usr + 0.00 sys = 5.98 CPU) @ 7 +.19/s (n=43) btrott: 6 wallclock secs ( 5.38 usr + 0.00 sys = 5.38 CPU) @ 63 +.01/s (n=339)

      mikfire

        Note that Perl arrays are most definitely not linked lists. They are neat, however.

        What you may be remembering is that pulling things off of the front of an array is nearly as efficient as popping them off of the back of an array. This is because Perl doesn't bother to shift all of the elements down until the gap in front gets big enough. But removing items from the middle of an array requires that one end (or the other?) be shifted.

        The drastic difference in benchmark times confuses me. Update: No, "sub btrott" isn't broken (my debugging was). Hmmm....

        Update 2: Never underestimate the slowness of Perl's opcode machinery. The "map" version loses because it executes 10_000 statements while the "splice" version only executes 100. While each "splice" statement has to copy a big chunk of the 10_000-element array, Perl's overhead swamps that. With the number of elements to be removed raised to 5_000, the "map" version is faster.

                - tye (but my friends call me "Tye")
        Nope. Arrays are not "linked lists". They do not "splice" well, but they do "shift" and "pop" quickly. And they can "unshift" and "push" quickly if the spare (or leftover) memory happens to be in the right place, otherwise, they must be copied.

        -- Randal L. Schwartz, Perl hacker

        Your post illustrates the need to benchmark code carefully! The map solutions are better algorithmically, meaning that as you increase the size of the array and the number of elements removed, btrott's solution will slow down much more than they do. But in the real world the worse algorithm may still work out to be better.

        In this example, clearly that is the case. If you want to play around with it, here is some sample code to mess around with:

        use Benchmark; my $n = 10000; my $m = 100; my @array = 0..$n; my @idxsToRemove = map {int( rand( 10000 ) ) } 1..$m; my %tests = ( grepping => sub { my @subarray = @array; my %rem; @toRemoveIdx{@idxsToRemove} = (1)x@idxsToRemove; @subarray = @subarray[ grep !$toRemoveIdx{$_}, 0..$#subarray ]; }, mapping => sub { my @subarray = @array; my %rem; @toRemoveIdx{@idxsToRemove} = (1)x@idxsToRemove; @subarray = map { $toRemoveIdx{$_} ? () : $subarray[$_] } 0..$#s +ubarray; }, sorting => sub { my @subarray = @array; splice @subarray, $_, 1 for sort { $b <=> $a } @idxsToRemove; @subarray; }, ); timethese( shift @ARGV || -5, \%tests );
        Play with it. As you move $m and $n up you will find the first two solutions scaling better. But unless you have some truly impressive numbers, the sort solution by btrott will be faster.

        (Yes, I know the test could be improved a lot...)

        NOTE: The sorting solution is both faster and incorrect. If your list of elements to remove has duplicates, you incorrectly remove the same element multiple times! This illustrates my real reason for liking map, I find it easier for me to figure out and validate all possible gotchas using it. Even if it is sometimes a lot slower. :-)

        EDIT
        A typo change, noticed that I changed code without touching the description of what the code did. Oops.

Re: Removing certain elements from an array
by chromatic (Archbishop) on Aug 24, 2000 at 21:53 UTC
    I like the hash and array slices, but I'm not thrilled with the map. Here's a version that needs a sort, which may not gain you anything, but at least it's a different approach.
    my %h; @h{ 0 .. $#array }++; delete @h{@remindexes}; @array = @array[ sort { $a <=> $b } keys %h ];
Re: Removing certain elements from an array
by xdb19 (Sexton) on Aug 24, 2000 at 23:39 UTC
    This is a one liner that works well for me.
    splice( @array, $_ - $ind++, 1 ) foreach ( sort @remindexes );

    Have Fun - XDB19
      Attack from the rear:

      splice( @array, $_, 1) for (reverse sort @remindexes);

      This way, you start with the highest array index and move down, so you don't have to keep count of how many elements you have removed.

Re: Removing certain elements from an array
by isotope (Deacon) on Aug 24, 2000 at 20:52 UTC
    Try something like this (untested):

    foreach(@remindexes) { splice(@array, $_, 1); }


    Update: This is going to squeeze the array down as it goes, so the indices in @remindexes will be inaccurate. See btrott's post below for the right way to do it.

    HTH,

    --isotope
Re: Removing certain elements from an array
by le (Friar) on Aug 24, 2000 at 20:58 UTC
    My solution (clumsy as usual):
    my @array = qw( ...); my @remindexes = qw( ... ); my %remindexes; my @newarray; for (@remindexes) { $remindexes{$_}++; } for (my $i = 0; $i <= $#array; $i++) { push @newarray, $array[$i] if !$remindexes{$i}; }
    Of course, if you would setup the indexes to remove into a hash from the beginning, you could omit the first step.
RE: Removing certain elements from an array
by BlaisePascal (Monk) on Aug 24, 2000 at 23:00 UTC
    This may work as well...
    @hugearray = ( ... ); @toremove = ( ... ); # remove from @hugearray elements indexed in @toremove @hugearray[@toremove] = ("RemoveMe") x @toremove; push @hugearray, "SentinelFlag"; for (my $_;;) { $_ = shift @hugearray; last if $_ eq "SentinelFlag"; next if $_ eq "RemoveMe"; push @hugearray, $temp; }
    This works by flagging the elements you want to delete, then using @hugearray as a circular shift register, shift-pushing each element in the array in turn, until it gets back to the beginning. As it goes, it doesn't push back in the elements to be deleted.

    How does it fare efficiency-wise? This routine touches each data element once, and likely does at most a single implicit array-copy. I think it is probably an O(n+m) operation (n=@hugearray, m=@toremove). The splice solution is O(m log m + m*O(splice(n)). Since perl arrays are not linked lists, I think splice is an O(n) operation, meaning that the splice solutions are O(m log m + m*n). Which is faster? My guess would be the shift-register, but I don't know.

Re: Removing certain elements from an array
by tye (Sage) on Aug 24, 2000 at 22:50 UTC

    FYI, Benchmark.pm has proved every guess of mine wrong in this thread:

    • Using blocks in map didn't make anything slower (I didn't investigate why).
    • The "map" solution is much slower than the "splice" solution unless nearly half of the elements are removed (due to overhead in executing so many Perl statements).
    • chromatic's solution is slower still (which I thought would be close in speed to the "splice" solution, if not faster). I think this is because the "sort" has to be done on so many more elements.
    But then that is why Benchmark was written.

            - tye (but my friends call me "Tye")
Re: Removing elements from array (use grep!)
by Russ (Deacon) on Aug 24, 2000 at 22:31 UTC
    Hmmm... Haven't seen a solution using the tailor-made grep.
    my @array = (0,1,2,3,4,5,6,7,8,9); # The source array my %DropList = map {$_ => 1} (2,4,6,8); # A hash of the indices to rem +ove my @newarray = @array[grep {not exists $DropList{$_}} (0..$#array)]; print "@newarray\n";
    Notes:
    • I am still using 5.005_57 to match a production server, so if you prefer to use exists on arrays in 5.6...
    Enjoy.

    Russ
    Brainbench 'Most Valuable Professional' for Perl

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://29451]
Approved by root
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others pondering the Monastery: (4)
As of 2024-04-18 22:46 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found