Removing certain elements from an array

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Removing certain elements from an array by btrott (Parson) on Aug 24, 2000 at 20:53 UTC
I think this'll do it for you: `splice @array, $_, 1 for sort { $b <=> $a } @indices;` [download] The reason for the sort is that it starts dropping elements from the end of the array. If you start from the beginning, you won't drop the elements that you want to drop. Say you want to drop the elements at indices 0 and 2; you drop the element at index 0, but then the other elements squish towards the front of the array. So element 2 is now element 1. So you'll drop the wrong elements. That's why you should start from the end. Update: tilly pointed out a bug where duplicate indices could lead to dropping incorrect elements in the array. Here's a fix: `my %u; splice @array, $_, 1 for grep !$u{$_}++, sort { $b <=> $a } @indices;` [download] This is just getting silly in terms of the number of times we're looping over the data; but it does still seem to be quite fast. Faster than I would have expected, in fact.	[reply] [d/l] [select]
RE (tilly) 2: Removing certain elements from an array by tilly (Archbishop) on Aug 25, 2000 at 00:20 UTC
There is a small bug in this, drove me crazy while I was trying to double-check benchmarks. See RE (tilly) 3: Removing certain elements from an array for details. Otherwise very nice and fast! :-)	[reply]
RE: Re: Removing certain elements from an array by Anonymous Monk on Aug 24, 2000 at 20:58 UTC
Cool, that was the problem I was having. I knew about the wonders of splice but just didn't think of using sort to prevent the indexes from becoming invalidated after doing the first splice. Thanks a lot!	[reply]
RE (tilly) 1: Removing certain elements from an array by tilly (Archbishop) on Aug 24, 2000 at 21:30 UTC
TMTOWTDI, here is a map solution: `my %remove; ++$remove{$_} foreach @remindexes; @array = map ((exists $remove{$_} ? () : $array[$_]), 0..$#array);` [download] Basically build up a hash lookup for what you don't want, and map the indexes of the array into nothing or the array element as desired. BTW recent conversations with tye have reminded me that entering and leaving blocks has a significant overhead. Therefore I have not written this map with a block.	[reply] [d/l]
RE: Removing certain elements from an array by Boogman (Scribe) on Aug 24, 2000 at 20:50 UTC
Use the splice function. From perlfunc: `splice ARRAY,OFFSET,LENGTH,LIST Removes the elements designated by OFFSET and LENGTH from an array, and replaces them with the elements of LIST, if any.` [download] If you don't specify LIST, it just removes the specified elements from the array and "scrunches" the rest of the array together. Update: Oh and if you want have a list of them you wanted to remove, you would probably want to do something like `splice( @array, $_, 1 ) foreach ( @elementsToDelete );` And @elementsToDelete would have to be sorted in descending order as btrott mentions.	[reply] [d/l] [select]
Re: Removing certain elements from an array by tye (Sage) on Aug 24, 2000 at 21:31 UTC
`my @array; getreallyhugelist( \@array ); my @idxsToRemove= getnotashugelist(); my %toRemoveIdx; @toRemoveIdx{@idxsToRemove}= (1)x@idxsToRemove; @array= map { $toRemoveIdx{$_} ? () : $array[$_] } 0..$#array;` [download] That last line copies the whole array once (puts it on the stack) and then only has to shift each non-removed element once. So it will probably be faster than repeated calls to splice() (each one having to shift the whole end of the array). I think this speed-up will more than compensate for the cost of building the hash (on the preceding line) if you are removing more than a few elements. - tye (but my friends call me "Tye")	[reply] [d/l]
RE: Re: Removing certain elements from an array by mikfire (Deacon) on Aug 24, 2000 at 22:03 UTC
There is a memory tickling the back of my brain that disagrees with you. As I understand/remember it, perl implements arrays as linked lists. splice'ing nodes should not "shift the whole end of the array". This vague memory says perl's lists were explicitly designed to make this kind of stuff work well. My opinion of Benchmark is very low - the results I generated seems to support this vauge memory of mine. `#!/os/dist/bin/perl -w use Benchmark; my @array = 0..10000; my @idxsToRemove = (); push @idxsToRemove, int( rand( 10000 ) ) for ( 0 .. 100 ); my %toRemoveIdx; @toRemoveIdx{@idxsToRemove} = (1)x@idxsToRemove; sub Tye { my @subarray = @array; @subarray = map { $toRemoveIdx{$_} ? () : $subarray[$_] } 0..$#sub +array; } sub btrott { my @subarray = @array; splice @subarray, $_, 1 for sort { $b <=> $a } @idxsToRemove; } imethese( shift @ARGV \|\| -5, { 'Tye' => \&Tye, 'btrott' => \&btrott, } );` [download] Generated these results: `Benchmark: running Tye, btrott, mikfire, each for at least 5 CPU seco +nds... Tye: 7 wallclock secs ( 5.98 usr + 0.00 sys = 5.98 CPU) @ 7 +.19/s (n=43) btrott: 6 wallclock secs ( 5.38 usr + 0.00 sys = 5.38 CPU) @ 63 +.01/s (n=339)` [download] mikfire	[reply] [d/l] [select]
RE: RE: Re: Removing certain elements from an array by tye (Sage) on Aug 24, 2000 at 22:16 UTC
Note that Perl arrays are most definitely not linked lists. They are neat, however. What you may be remembering is that pulling things off of the front of an array is nearly as efficient as popping them off of the back of an array. This is because Perl doesn't bother to shift all of the elements down until the gap in front gets big enough. But removing items from the middle of an array requires that one end (or the other?) be shifted. The drastic difference in benchmark times confuses me. Update: No, "sub btrott" isn't broken (my debugging was). Hmmm.... Update 2: Never underestimate the slowness of Perl's opcode machinery. The "map" version loses because it executes 10_000 statements while the "splice" version only executes 100. While each "splice" statement has to copy a big chunk of the 10_000-element array, Perl's overhead swamps that. With the number of elements to be removed raised to 5_000, the "map" version is faster. - tye (but my friends call me "Tye")	[reply]
RE: RE: Re: Removing certain elements from an array by merlyn (Sage) on Aug 24, 2000 at 22:15 UTC
Nope. Arrays are not "linked lists". They do not "splice" well, but they do "shift" and "pop" quickly. And they can "unshift" and "push" quickly if the spare (or leftover) memory happens to be in the right place, otherwise, they must be copied. -- Randal L. Schwartz, Perl hacker	[reply]
RE: RE: RE: Re: Removing certain elements from an array by mikfire (Deacon) on Aug 24, 2000 at 22:30 UTC
RE: RE: Re: Removing certain elements from an array by lhoward (Vicar) on Aug 24, 2000 at 22:18 UTC
merlyn is (as always) correct. You may want to see my Shift, Pop, Unshift and Push with Impunity! post which really explores the internal structure and performance characteristics of perl's primitive list operators.	[reply]
RE (tilly) 3: Removing certain elements from an array by tilly (Archbishop) on Aug 25, 2000 at 00:17 UTC
Your post illustrates the need to benchmark code carefully! The map solutions are better algorithmically, meaning that as you increase the size of the array and the number of elements removed, btrott's solution will slow down much more than they do. But in the real world the worse algorithm may still work out to be better. In this example, clearly that is the case. If you want to play around with it, here is some sample code to mess around with: use Benchmark; my $n = 10000; my $m = 100; my @array = 0..$n; my @idxsToRemove = map {int( rand( 10000 ) ) } 1..$m; my %tests = ( grepping => sub { my @subarray = @array; my %rem; @toRemoveIdx{@idxsToRemove} = (1)x@idxsToRemove; @subarray = @subarray[ grep !$toRemoveIdx{$_}, 0..$#subarray ]; }, mapping => sub { my @subarray = @array; my %rem; @toRemoveIdx{@idxsToRemove} = (1)x@idxsToRemove; @subarray = map { $toRemoveIdx{$_} ? () : $subarray[$_] } 0..$#s +ubarray; }, sorting => sub { my @subarray = @array; splice @subarray, $_, 1 for sort { $b <=> $a } @idxsToRemove; @subarray; }, ); timethese( shift @ARGV \|\| -5, \%tests ); [download] Play with it. As you move $m and $n up you will find the first two solutions scaling better. But unless you have some truly impressive numbers, the sort solution by btrott will be faster. (Yes, I know the test could be improved a lot...) NOTE: The sorting solution is both faster and incorrect. If your list of elements to remove has duplicates, you incorrectly remove the same element multiple times! This illustrates my real reason for liking map, I find it easier for me to figure out and validate all possible gotchas using it. Even if it is sometimes a lot slower. :-) EDIT A typo change, noticed that I changed code without touching the description of what the code did. Oops.	[reply] [d/l]
Re: Removing certain elements from an array by chromatic (Archbishop) on Aug 24, 2000 at 21:53 UTC
I like the hash and array slices, but I'm not thrilled with the map. Here's a version that needs a sort, which may not gain you anything, but at least it's a different approach. `my %h; @h{ 0 .. $#array }++; delete @h{@remindexes}; @array = @array[ sort { $a <=> $b } keys %h ];` [download]	[reply] [d/l]
Re: Removing certain elements from an array by xdb19 (Sexton) on Aug 24, 2000 at 23:39 UTC
This is a one liner that works well for me. `splice( @array, $_ - $ind++, 1 ) foreach ( sort @remindexes );` [download] Have Fun - XDB19	[reply] [d/l]
Re: Re: Removing certain elements from an array by Cirollo (Friar) on Jun 26, 2001 at 19:38 UTC
Attack from the rear: `splice( @array, $_, 1) for (reverse sort @remindexes);` This way, you start with the highest array index and move down, so you don't have to keep count of how many elements you have removed.	[reply] [d/l]
Re: Removing certain elements from an array by isotope (Deacon) on Aug 24, 2000 at 20:52 UTC
Try something like this (untested): `foreach(@remindexes) { splice(@array, $_, 1); }` [download] Update: This is going to squeeze the array down as it goes, so the indices in @remindexes will be inaccurate. See btrott's post below for the right way to do it. HTH, --isotope	[reply] [d/l]
Re: Removing certain elements from an array by le (Friar) on Aug 24, 2000 at 20:58 UTC
My solution (clumsy as usual): `my @array = qw( ...); my @remindexes = qw( ... ); my %remindexes; my @newarray; for (@remindexes) { $remindexes{$_}++; } for (my $i = 0; $i <= $#array; $i++) { push @newarray, $array[$i] if !$remindexes{$i}; }` [download] Of course, if you would setup the indexes to remove into a hash from the beginning, you could omit the first step.	[reply] [d/l]
RE: Removing certain elements from an array by BlaisePascal (Monk) on Aug 24, 2000 at 23:00 UTC
This may work as well... `@hugearray = ( ... ); @toremove = ( ... ); # remove from @hugearray elements indexed in @toremove @hugearray[@toremove] = ("RemoveMe") x @toremove; push @hugearray, "SentinelFlag"; for (my $_;;) { $_ = shift @hugearray; last if $_ eq "SentinelFlag"; next if $_ eq "RemoveMe"; push @hugearray, $temp; }` [download] This works by flagging the elements you want to delete, then using @hugearray as a circular shift register, shift-pushing each element in the array in turn, until it gets back to the beginning. As it goes, it doesn't push back in the elements to be deleted. How does it fare efficiency-wise? This routine touches each data element once, and likely does at most a single implicit array-copy. I think it is probably an O(n+m) operation (n=@hugearray, m=@toremove). The splice solution is O(m log m + mO(splice(n)). Since perl arrays are not linked lists, I think splice is an O(n) operation, meaning that the splice solutions are O(m log m + mn). Which is faster? My guess would be the shift-register, but I don't know.	[reply] [d/l]
Re: Removing certain elements from an array by tye (Sage) on Aug 24, 2000 at 22:50 UTC
FYI, Benchmark.pm has proved every guess of mine wrong in this thread: Using blocks in `map` didn't make anything slower (I didn't investigate why). The "map" solution is much slower than the "splice" solution unless nearly half of the elements are removed (due to overhead in executing so many Perl statements). chromatic's solution is slower still (which I thought would be close in speed to the "splice" solution, if not faster). I think this is because the "sort" has to be done on so many more elements. But then that is why Benchmark was written. - tye (but my friends call me "Tye")	[reply] [d/l]
Re: Removing elements from array (use grep!) by Russ (Deacon) on Aug 24, 2000 at 22:31 UTC
Hmmm... Haven't seen a solution using the tailor-made grep. `my @array = (0,1,2,3,4,5,6,7,8,9); # The source array my %DropList = map {$_ => 1} (2,4,6,8); # A hash of the indices to rem +ove my @newarray = @array[grep {not exists $DropList{$_}} (0..$#array)]; print "@newarray\n";` [download] Notes: I am still using 5.005_57 to match a production server, so if you prefer to use exists on arrays in 5.6... Enjoy. Russ Brainbench 'Most Valuable Professional' for Perl	[reply] [d/l]


Keep It Simple, Stupid
	PerlMonks