comment on

Hmm, its a little hard to benchmark your approach against the others. (Thats why you used Benchmark::Timer right?) Anyway, I tried a couple of interpretations. Considering that all of the others had initialization overhead involved i decided that your approach needed it as well. I realize that this is not enitrely fair, considering that your approach would not incur this overhead under some designs. However the same argument applies to the other solutions to a certain degree. The auxiliary hashes used by the faq solution could be precomputed when reading the array for example.

Despite the difficulties I gave it a crack (using two different ways to set the hash based solution up) and this is what I came up with

#! perl -w
use strict;
use Benchmark qw(cmpthese);

sub intersect (\@\@) {
    my ($aryRef1, $aryRef2) = @_;
    my %count;
    $count{$_}++ foreach @$aryRef1, @$aryRef2;
    return grep{ $count{$_} == 1 } keys %count;
}

my @array1 = (1 .. 28000);
my @array2 = (1..1000, 1..100); # create some duplicates with duplicat
+es

my %hashBase;
@hashBase{@array1} = undef;

sub faq {
    my @dup = intersect( @array1, @array2 );
}
sub hash {
    my %hashAry;
    @hashAry{@array1} = undef;
    my @dup = delete @hashAry{@array2};
}
sub prehash {
    my %hashAry = %hashBase;
    my @dup = delete @hashAry{@array2};

}
sub hash_lret {
    my %hashAry;
    @hashAry{@array1} = undef;
    my @dup = delete @hashAry{@array2};
    return keys %hashAry;
}

sub prehash_lret {
    my %hashAry = %hashBase;
    my @dup = delete @hashAry{@array2};
    return keys %hashAry;
}

sub grepped {
    my %reject;
    $reject{$_} = 1 for @array2;
    my @clean=grep !$reject{$_}, @array1;
}

cmpthese 100,{
              faq          => \&faq,
              hash         => \&hash,
              prehash      => \&prehash,
              hash_lret    => \&hash_lret,
              prehash_lret => \&prehash_lret,
              grepped      => \&grepped,
             };
[download]


__END__
Benchmark: timing 100 iterations of faq, grepped, hash, hash_lret, pre
+hash, prehash_lret...
       faq: 35 wallclock secs (34.09 usr +  0.02 sys = 34.11 CPU) @  2
+.93/s (n=100)
   grepped: 11 wallclock secs ( 9.91 usr +  0.00 sys =  9.91 CPU) @ 10
+.09/s (n=100)
      hash: 10 wallclock secs (10.27 usr +  0.00 sys = 10.27 CPU) @  9
+.74/s (n=100)
 hash_lret: 10 wallclock secs (10.23 usr +  0.00 sys = 10.23 CPU) @  9
+.77/s (n=100)
   prehash: 16 wallclock secs (15.13 usr +  0.00 sys = 15.13 CPU) @  6
+.61/s (n=100)
prehash_lret: 16 wallclock secs (15.17 usr +  0.00 sys = 15.17 CPU) @ 
+ 6.59/s (n=100)
               Rate       faq prehash_lret  prehash      hash hash_lre
+t  grepped
faq          2.93/s        --         -56%     -56%      -70%      -70
+%     -71%
prehash_lret 6.59/s      125%           --      -0%      -32%      -33
+%     -35%
prehash      6.61/s      126%           0%       --      -32%      -32
+%     -35%
hash         9.74/s      232%          48%      47%        --       -0
+%      -4%
hash_lret    9.77/s      233%          48%      48%        0%        -
+-      -3%
grepped      10.1/s      244%          53%      53%        4%        3
+%       --
[download]

the _lret results are the ones that actually return a list. And the prehash vs hash has to do with how the variables are initialized. The results I think are suprising.

Incidentally unless Benchmark::Timer is timing _long_ events it would lose accuracy, perhaps considerably. On most boxes the resolution is around 1/100th of a second. That is why Benchmark is written the way it is.

This is one of those things that makes benchmarking different abstract solutions against each other very difficult. Ultimately the only real way to benchmark is to use a live solution and time it against a different design under the same enviornment.

--- demerphq
my friends call me, usually because I'm late....

In reply to Re: Re: How to splice out multiple entries from an array by demerphq
in thread How to splice out multiple entries from an array by Ya

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.