JimSki has asked for the wisdom of the Perl Monks concerning the following question:

Hello, I'm relatively new to PERL and have a pretty basic question. I am trying to find the fastest (not necessarily most elegant) way to generate a list which is the difference of two arrays.
For example, if
@goodstuff = (1,2,3,4,5); @badstuff = (1,3);
I'd like @newstuff to equal (2,4,5).
Is GREP usable in this instance?
I know I can do this iteratively but it's pretty slow. Thanks for any advice!

Replies are listed 'Best First'.
Re: Subtracting Lists
by educated_foo (Vicar) on Jun 02, 2008 at 04:54 UTC
    Hashes are your friends:
    sub diff { my ($a, $b) = @_; my %b; undef @b{@$b}; grep !exists $b{$_}, @$a; }
Re: Subtracting Lists
by pc88mxer (Vicar) on Jun 02, 2008 at 00:38 UTC
    Here's a solution using List::MoreUtils:
    use List::MoreUtils qw(any); @diff = map { my $x = $_; (any { $x eq $_ } @badstuff) ? () : $x; } @badstuff;
    Admittedly it isn't very efficient, but it was kinda fun to write.

      I personally believe that

      @y = map { condition($_) ? $_ : () } @x;

      is generally spelt

      @y = grep { condition($_) } @x;

      Of course, you have the condition reversed, but then a ! would do, and I doubt it would impose a considerable penalty in terms of efficience.

      --
      If you can't understand the incipit, then please check the IPB Campaign.
Re: Subtracting Lists
by Herkum (Parson) on Jun 02, 2008 at 00:38 UTC
    my %results; @goodstuff = (1,2,3,4,5); @badstuff = (1,3); for (@goodstuff) { $results{$_}++ } for (@badstuff} { $results{$_}++ } for (keys %results) { if ($results{$_} == 1) { print "Value: " . $_ . " was found in only one array\n"; } }

    I think that is right, I just coded it off the top of my head so no promises...

      Your correct in saying your code finds elements which are only in one of the sets. But the OP asked for an implementation to find the difference of two sets. Your code only finds the difference when @badstuff is a subset of @goodstuff.

      A solution:

      my %result; @result{ @goodstuff } = (); delete @result{ @badstuff }; my @result = keys %result;
        The O.P did not state if he wanted the results in the same order as the original array.

        Even without that requirement, if the original array contained duplicate values, a solution that collapsed the original into a hash would eliminate duplicates, and , IMHO, provide a wrong result.

        However, 3 good solutions which do not collapse the original array have been provided by others, below.

        I would suggest that educated_foo's (++) is the most portable,readable one with no dependencies.

             "A fanatic is one who redoubles his effort when he has forgotten his aim."—George Santayana

      What if there are repetitions in either of the arrays? And the OP asked for a efficient solution: while I have not tested your algorithm, and it is potentially interesting for being linear in the sizes of the arrays, I believe that building a hash and then iterating over its keys will make it slow enough for many reasonable data sets. (I don't think this would matter, but the OP is convinced it does...)

      --
      If you can't understand the incipit, then please check the IPB Campaign.
Re: Subtracting Lists
by jettero (Monsignor) on Jun 02, 2008 at 00:43 UTC
    Maybe overkill depending on what you're working on ... Set::Scalar is one of my favorites for problems like that (but bigger).

    -Paul

Re: Subtracting Lists
by poolpi (Hermit) on Jun 02, 2008 at 08:40 UTC

    See Set::IntSpan

    From the doc : << Sets are stored internally in a run-length coded form. This provides for both compact storage and efficient computation. In particular, set operations can be performed directly on the encoded representation. >>

    #!/usr/bin/perl -w use strict; use Data::Dumper; use Set::IntSpan; my $good_run_list = '1-5'; my $bad_run_list = '1,3'; my ( $good_set, $bad_set, $x_set ); if ( valid Set::IntSpan $good_run_list and valid Set::IntSpan $bad_run +_list ) { $good_set = new Set::IntSpan $good_run_list; $bad_set = new Set::IntSpan $bad_run_list; } else { print "Please verify your run lists\n" } $x_set = $good_set ^ $bad_set; # Or $x_set = $good_set->X($bad_set); finite $x_set and my @elements = elements $x_set; print Dumper \@elements; __END__ Output: $VAR1 = [ 2, 4, 5 ];

    hth,
    PooLpi

    'Ebry haffa hoe hab im tik a bush'. Jamaican proverb
Re: Subtracting Lists
by blazar (Canon) on Jun 02, 2008 at 16:46 UTC
    Hello, I'm relatively new to PERL and have a pretty basic question.

    I personally believe that the very fist thing you should know is that it's not PERL.

    Is GREP usable in this instance?

    It's certainly usable. But it all depends on the code you tell it to execute. One common technique to check whether and item belongs to an array used to involve another grep, but that would be inefficient, since it would iterate over all items of the bad list. Thus a better choice would be List::Util's first() which stops as early as an item is found. With 5.10, however, you have a specialized operator to do this kinda things: ~~:

    #!/usr/bin/perl use strict; use warnings; use 5.010; my @goodstuff=1..5; my @badstuff=(1,3); my @newstuff = grep !($_ ~~ @badstuff) => @goodstuff; say "@newstuff"; __END__
    --
    If you can't understand the incipit, then please check the IPB Campaign.
Re: Subtracting Lists
by Anonymous Monk on Jun 02, 2008 at 13:45 UTC
    make a hash with the keys beeing elements from the first list.
    do the same for the other list.
    you end up with 2 hashes.
    now you can traverse the keys of one hash and see if it's in the other hash.
    that way would be much faster than the usual O(n^2) you'd get from doing traversal of the second list for every element of the first one.