Re^4: Find duplicate digits

You piqued my curiosity. It seems that whether we regex or use a hash, they are basically equivalent. I used a dataset that is all numbers 1000..9999, which is the most fair test I could devise.

Please read the reply below. This is what I get for forgetting strict and warnings in my benchmark code. It turns out the hash approach is much faster than the regex (which, honestly, surprised me).

$ test.pl
           Rate regex  hash
regex 1048218/s    --   -0%
hash  1048218/s    0%    --

$ test.pl
           Rate regex  hash
regex 1047449/s    --   -1%
hash  1059547/s    1%    --

$ test.pl
           Rate  hash regex
hash  1043950/s    --   -1%
regex 1054296/s    1%    --
[download]

Benchmark code...

use Time::HiRes;
use Benchmark ':all';

sub regex {  
   my @keep;
   foreach my $num (@data) {
      chomp;
      for ( split //, $num ) {
         my @a = ($num =~ m/$_/g);
         if (@a == 2) { push @keep, $num; last; }
      }
   }
}

sub hash {
   my @keep;     # numbers to keep
   foreach my $line (@data) {
       chomp;
       my %count;
       ++$count{$_} for split '', $line;
       push @keep, $line if grep { $_ == 2 } values %count
   }
}

my @data = <DATA>;

cmpthese ( 1_000_000, {
   hash  => 'hash()',
   regex => 'regex()',
});
[download]

<-radiant.matrix->
A collection of thoughts and links from the minds of geeks
The Code that can be seen is not the true Code
I haven't found a problem yet that can't be solved by a well-placed trebuchet

Comment on Re^4: Find duplicate digits Select or Download Code

Replies are listed 'Best First'.
Re^5: Find duplicate digits by GrandFather (Saint) on Feb 16, 2006 at 03:51 UTC
Now that is a truly interesting result, especially as when I modified it slightly to add my own variant I got a storm of undefined value used warnings from the `chomp;`s in the foreach loops that don't have anything to chomp on. When I fixed that I obtained the following results. (Note that I've not validated the operation of the individual functions) `Rate regex hash gf regex 1.94/s -- -73% -78% hash 7.17/s 269% -- -17% gf 8.67/s 347% 21% --` [download] Read more... Benchmark code (1029 Bytes) DWIM is Perl's answer to Gödel	[reply] [d/l] [select]

Replies are listed 'Best First'.

Re^5: Find duplicate digits
by GrandFather (Saint) on Feb 16, 2006 at 03:51 UTC

Now that is a truly interesting result, especially as when I modified it slightly to add my own variant I got a storm of undefined value used warnings from the chomp;s in the foreach loops that don't have anything to chomp on. When I fixed that I obtained the following results. (Note that I've not validated the operation of the individual functions)

        Rate regex  hash    gf
regex 1.94/s    --  -73%  -78%
hash  7.17/s  269%    --  -17%
gf    8.67/s  347%   21%    --
[download]

Read more... Benchmark code (1029 Bytes)

DWIM is Perl's answer to Gödel

[reply]
[d/l]
[select]