comment on

You piqued my curiosity. It seems that whether we regex or use a hash, they are basically equivalent. I used a dataset that is all numbers 1000..9999, which is the most fair test I could devise.

Please read the reply below. This is what I get for forgetting strict and warnings in my benchmark code. It turns out the hash approach is much faster than the regex (which, honestly, surprised me).

$ test.pl
           Rate regex  hash
regex 1048218/s    --   -0%
hash  1048218/s    0%    --

$ test.pl
           Rate regex  hash
regex 1047449/s    --   -1%
hash  1059547/s    1%    --

$ test.pl
           Rate  hash regex
hash  1043950/s    --   -1%
regex 1054296/s    1%    --
[download]

Benchmark code...

use Time::HiRes;
use Benchmark ':all';

sub regex {  
   my @keep;
   foreach my $num (@data) {
      chomp;
      for ( split //, $num ) {
         my @a = ($num =~ m/$_/g);
         if (@a == 2) { push @keep, $num; last; }
      }
   }
}

sub hash {
   my @keep;     # numbers to keep
   foreach my $line (@data) {
       chomp;
       my %count;
       ++$count{$_} for split '', $line;
       push @keep, $line if grep { $_ == 2 } values %count
   }
}

my @data = <DATA>;

cmpthese ( 1_000_000, {
   hash  => 'hash()',
   regex => 'regex()',
});
[download]

<-radiant.matrix->
A collection of thoughts and links from the minds of geeks
The Code that can be seen is not the true Code
I haven't found a problem yet that can't be solved by a well-placed trebuchet

In reply to Re^4: Find duplicate digits by radiantmatrix
in thread Find duplicate digits by Anonymous Monk

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.