in reply to How to splice out multiple entries from an array

  1. What you are looking for is called a "set difference"
  2. This is covered in perlfaq4
  3. if you run perldoc -q difference perldoc will search for FAQ entries that refer to that keyword
  4. Here is what the FAQ has to say...

    Found in /usr/share/perl/5.6.1/pod/perlfaq4.pod How do I compute the difference of two arrays? How do I compute the intersection of two arrays? Use a hash. Here's code to do both and more. It assumes that each element is unique in a given array: @union = @intersection = @difference = (); %count = (); foreach $element (@array1, @array2) { $count{$eleme +nt}++ } foreach $element (keys %count) { push @union, $element; push @{ $count{$element} > 1 ? \@intersection : + \@difference }, $element; } Note that this is the symmetric difference, that is, all elements in either A or in B but not in both. Think of it as an xor operation.

  5. "large inputs" tend to take a "long time" to process. Take that as a given in life -- something that's implied by the laws of physics

    The only real caveat, is that sometimes when dealing with large inputs, the nature of the problem lends itself to a solution in which you can get intermediate results as you go, without having to first see all of the results. (ie: find all lines of input tha match a certain pattern).

    Set operations do not fall into that category of problem -- you pretty much have to have both sets, in their entirety, available to you at the same time in order to find the difference (or intersection, or union)

Replies are listed 'Best First'.
Re: Re: How to splice out multiple entries from an array
by demerphq (Chancellor) on Oct 24, 2002 at 13:02 UTC
    Actually, the above algorithm is not particularly suitable for the OP's needs. Its relevent as a indicator on what to do but it doesnt solve his problem in the most efficient way.

    If you read the code, and assume that @arr1 is large, without many duplicates, and that @arr2 is small without many duplicates either, then it would appear that the above will iterate over the values of @arr1 twice. Also the above has the problem that the resulting array will have no ordering relationship to the original one. I think a superior method is the one I posted in my orphaned reply. Or rather a minor variant that doesn use hash slices.

    my %remove; $remove{$_}++ for @arr2; @clean=grep !$remove{$_},@arr1;
    This code will only walk @arr1 and @arr2 once. Note that this is one of the weaknesses in 'big O notation'. Both of these solutions are O(N), however, the worst cases of both are 2(N+M) and N+M, thus the later has a much better worst case even though they are the same order of solution.

    BrowserUk posted a benchmark that I will mod to benchmark the various solutions, including the one above.

    Regards,

    --- demerphq
    my friends call me, usually because I'm late....

Re: Re: How to splice out multiple entries from an array
by demerphq (Chancellor) on Oct 24, 2002 at 08:57 UTC
Re: Re: How to splice out multiple entries from an array
by Ya (Initiate) on Oct 24, 2002 at 11:31 UTC
    Thanks for your reply.

    Best Regards,

    Ya