in reply to Re^2: promoting array to a hash
in thread promoting array to a hash

very important update: Please see Re^4: promoting array to a hash by BrowserUk for why the following code is horrifically wrong. Of course, that said, it requires one very simple s!my!our! to correct.

I know that the map vs. slice benchmark has been done before, but just to do it again as a reminder :)

#!perl -w use strict; use Benchmark ':all'; my @unsorted = map { join('', map { ('a'..'z','A'..'Z',0..9)[rand 62] } 1..50) } 1..5000; sub uniq_dragonchild { my %x; @x{@_} = @_; values %x } sub uniq_BrowserUk { my %x; @x{@_} = (); keys %x } sub uniq_Zaxo { keys %{ { map { $_ => undef } @_ } } } cmpthese( timethese(-60, { uniq_dragonchild => 'my @x = uniq_dragonchild(@unsorted)', uniq_BrowserUk => 'my @x = uniq_BrowserUk(@unsorted)', uniq_Zaxo => 'my @x = uniq_Zaxo(@unsorted)' } ) ); __END__ C:\>uniq.pl Benchmark: running uniq_BrowserUk, uniq_Zaxo, uniq_dragonchild for at +least 60 C PU seconds... uniq_BrowserUk: 64 wallclock secs (63.19 usr + 0.02 sys = 63.20 CPU) +@ 421025.4 1/s (n=26610069) uniq_Zaxo: 59 wallclock secs (60.08 usr + 0.03 sys = 60.11 CPU) @ 18 +6939.31/s (n=11237109) uniq_dragonchild: 64 wallclock secs (63.05 usr + 0.02 sys = 63.06 CPU +) @ 399674 .64/s (n=25204682) Rate uniq_Zaxo uniq_dragonchild uniq_Bro +wserUk uniq_Zaxo 186939/s -- -53% + -56% uniq_dragonchild 399675/s 114% -- + -5% uniq_BrowserUk 421025/s 125% 5% + --

Replies are listed 'Best First'.
Re^4: promoting array to a hash
by Roy Johnson (Monsignor) on Jun 13, 2004 at 13:58 UTC
    Adding this sub:
    sub uniq_Roy { my %x; grep {!$x{$_}++} @_ }
    I got these results
    Rate uniq_Zaxo uniq_dragonchild uniq_BrowserUk + uniq_Roy uniq_Zaxo 167192/s -- -19% -27% + -47% uniq_dragonchild 205487/s 23% -- -11% + -35% uniq_BrowserUk 230383/s 38% 12% -- + -27% uniq_Roy 317586/s 90% 55% 38% + --

    We're not really tightening our belts, it just feels that way because we're getting fatter.
      Excellent! Yours is faster and will DWIM with references. Good deal! :-)

      ------
      We are the carpenters and bricklayers of the Information Age.

      Then there are Damian modules.... *sigh* ... that's not about being less-lazy -- that's about being on some really good drugs -- you know, there is no spoon. - flyingmoose

      I shouldn't have to say this, but any code, unless otherwise stated, is untested

      Unfortunately, the benchmark you started with wasn't good. See Re^4: promoting array to a hash.


      Examine what is said, not who speaks.
      "Efficiency is intelligent laziness." -David Dunham
      "Think for yourself!" - Abigail
Re^4: promoting array to a hash
by BrowserUk (Patriarch) on Jun 14, 2004 at 11:59 UTC

    Unfortunately, your benchmark is wrong. As the array @unsorted is lexical, it isn't visible within the benchmark code, so the subs are uniquifying an empty array. All your benchmark is measuring is how quickly the sub return nothing. Correcting that, I get these results.

    #!perl -w use strict; use Benchmark ':all'; our @unsorted = map { join('', map { ('a'..'z','A'..'Z',0..9)[rand 62] } 1..50) } 1..5000; sub uniq_dragonchild { my %x; @x{@_} = @_; values %x } sub uniq_BrowserUk { my %x; @x{@_} = (); keys %x } sub uniq_Zaxo { keys %{ { map { $_ => undef } @_ } } } sub uniq_Roy { my %x; grep {!$x{$_}++} @_ } our( @a, @b, @c, @d ); cmpthese( -1, { uniq_dragonchild => '@a = uniq_dragonchild( @unsorted )', uniq_BrowserUk => '@b = uniq_BrowserUk( @unsorted )', uniq_Zaxo => '@c = uniq_Zaxo(@unsorted)', uniq_Roy => '@d = uniq_Roy( @unsorted )', }); #my $a='a'; print $/, $a++, ' : ', "@{[ sort @$_ ]}" for ( \@a, \@b, \ +@c, \@d ); __END__ P:\test>363760 Rate uniq_dragonchild uniq_Zaxo uniq_Roy uniq +_BrowserUk uniq_dragonchild 18.1/s -- -21% -31% + -41% uniq_Zaxo 22.8/s 26% -- -13% + -25% uniq_Roy 26.3/s 45% 15% -- + -14% uniq_BrowserUk 30.6/s 69% 34% 16% + --

    Examine what is said, not who speaks.
    "Efficiency is intelligent laziness." -David Dunham
    "Think for yourself!" - Abigail

      Oh ouch, that bit me good. I can't believe I didn't find 420000+ iterations per second too many, considering it is handling a 5000 item list. Thanks BrowserUk (and obviously ++).