ketema has asked for the wisdom of the Perl Monks concerning the following question:

Would like some efficiency training...
I have two large lists @a and @b...
my goal is to find everything in list @b that is not in list @a, and put it in list @c
currently I do this by:
foreach my $thing(@b){ if(scalar grep(/^$thing/,@a) == 0){ push(@c,$thing); } }
Is there a better way to do this? For example is there a list comparison function that would work like:
@c = compare(@a,@b); where compare finds everything in @b that is not in @a? maybe it even takes a third parameter that finds eveything that is the same between the two lists.
Thanks

Replies are listed 'Best First'.
Re: List Comparison
by davido (Cardinal) on Sep 09, 2004 at 16:08 UTC

    This is a more computation-efficient way that takes advantage of the fact that hash lookups are essentially O(1):

    my %listA; @listA{@a}=(); foreach my $item ( @b ) { next if exists $listA{$item}; push @c, $item; }

    Of course CPAN is your friend for this sort of thing. But I wanted to show the hash technique because it replaces a single slow O(n^2) algorithm with two faster O(n) operations, which is an order of magnitude more computationally efficient (but more of a memory hog too).


    Dave

      You can also turn that loop into another hash operation, which will probably make it even faster.
      my %listA; @listA{@a}=(); delete @listA{@b}; @c = keys %listA;
Re: List Comparison
by ketema (Scribe) on Sep 09, 2004 at 15:57 UTC
    Well I refined my CPAN search and came back with List-Compare, how wonderful! Any comments on the performance of this mod? I'm just going to fire it up and see. PERL is great!
Re: List Comparison
by trammell (Priest) on Sep 09, 2004 at 15:57 UTC
Re: List Comparison
by herveus (Prior) on Sep 09, 2004 at 15:59 UTC
    Howdy!

    What sort of efficiency are you looking for? Time? Space? Coding?

    Your code will make repeated scans of @a, being O(n**2), but it takes up almost no extra space.

    Something like

    my %a = map { ($_, 1) } @a; @c = grep $a{$_}, @b;
    trades space for time, making a single pass over @a and @b, at the expense of making a hash with the elements of @a as keys. If @a is really big, the memory footprint could become an issue.

    Update: I see that you found stuff in CPAN while I was writing this; I was going to hit CPAN and post an update, but...

    yours,
    Michael

      Thanks for replying. yes I did find some stuff on CPAN right after the initial post. Hoenestly this is why I love PERL, everytime I am writing something and I have a idea or wish I could do something easier, I find it! Either here, cpan or some other group. Thanks again for the help, and to answer your question I was looking for code effiency in this instance, and your two lines was better than my 3, so that works. I like the List-Compare Mod, and I also found List-Utils which I am looking through now. we'll see how the scrip goes.


      thanks again