in reply to subtract one array from another

Not really an answer for this OP I think, but rather than building a hash to detect matches (which for large arrays could be expensive), why not use a string?

Would this be a Cheap idiom? Update:Obviously not!:( See below.) Good for golf maybe>

#! perl -sw use strict; my @array1= qw(1 83 90 120 140 300); my @array2= qw(83 140); my @array3 = grep{ local $"="\c1"; -1==index( "\c1@array2\c1", "\c1$_\ +c1") } @array1; print "@array3\n"; __END__ c:\test>205991 1 90 120 300

Cor! Like yer ring! ... HALO dammit! ... 'Ave it yer way! Hal-lo, Mister la-de-da. ... Like yer ring!

Replies are listed 'Best First'.
Re: Re: subtract one array from another
by jsprat (Curate) on Oct 17, 2002 at 18:35 UTC
    I've used the same idiom, BrowserUK.

    One minor nit, however - use a temp variable, and stringify the array before the grep. Stringifying @array2 for each element of @array1 is probably more expensive than building a hash.

    Update: Unless I missed something on my benchmark, the faq is the fastest - by far! I didn't expect it, but I should have ;-)

    I varied the size of the array and the faq consistently came out on top. The larger @array2 got, the better the faq performed.

    #!/usr/bin/perl use strict; use warnings; use Benchmark qw/cmpthese/; use vars qw/@array1 @array2/; sub perlfaq { # lifted from nothingmuch's post above my %hash; $hash{$_} = undef foreach (@array2); @array1 = grep { not exists $hash{$_} } @array1; } sub buk { grep{ local $"="\c1"; -1==index( "\c1@array2\c1", "\c1$_\c1") } @a +rray1; } sub improved { local $" = "\c1"; my $temp = "\c1@array2\c1"; grep { -1==index($temp, "\c1$_\c1") } @array1; } push @array1, int (rand 100) for (1 .. 1000); push @array2, int (rand 100) for (1 .. 100); cmpthese (1_000, { perlfaq => \&perlfaq, buk => \&buk, improved => \&improved, }) __END__ Benchmark: timing 1000 iterations of buk, improved, perlfaq... buk: 58 wallclock secs (50.99 usr + 0.01 sys = 51.00 CPU) @ 19 +.61/s (n=1 000) improved: 4 wallclock secs ( 3.98 usr + 0.00 sys = 3.98 CPU) @ 25 +1.51/s (n= 1000) perlfaq: 1 wallclock secs ( 0.88 usr + 0.00 sys = 0.88 CPU) @ 11 +35.07/s (n =1000) Rate buk improved perlfaq buk 19.6/s -- -92% -98% improved 252/s 1183% -- -78% perlfaq 1135/s 5689% 351% --

      Nice one jsprat++!

      I spent the last hour playing with your benchmark trying to redeem something from whatever brainfart it was that caused me to post without benchmarking first. I failed.

      I tried varying the size of the array and the size of the elements and the only time I saw any benefit from the non-hash version was when the size of array * size of elements pushed the boundaries of my installed memory. The hash takes a fair amount more memory than the composite string, so the non-hash version survives a while longer at the pathelogical extremes.

      I managed a marginal improvement to your improved version by doing away with interpolation all together vis.

      sub improved { my $c = ord(1); my $temp = join $c, @array2; @array1 = grep { -1==index($temp, $c . $_ . $c ) } @array1; }

      I also saw an equally marginal improvement (~2%) in the faq version by using slice assignment rather than a for loop, but I'm pretty sure that this has been noted elsewhere.

      sub perlfaq { my %hash; @hash{@array2} = ((undef) x @array2); @array1 = grep { not exists $hash{$_} } @array1; }

      Either is at best a micro-optomisation. When I thought about the fact that index has to search the whole string every time, it's fairly obvious that the hash will win.


      Cor! Like yer ring! ... HALO dammit! ... 'Ave it yer way! Hal-lo, Mister la-de-da. ... Like yer ring!