in reply to Re^3: Find duplicate digits
in thread Find duplicate digits
You piqued my curiosity. It seems that whether we regex or use a hash, they are basically equivalent. I used a dataset that is all numbers 1000..9999, which is the most fair test I could devise.
Please read the reply below. This is what I get for forgetting strict and warnings in my benchmark code. It turns out the hash approach is much faster than the regex (which, honestly, surprised me).
$ test.pl Rate regex hash regex 1048218/s -- -0% hash 1048218/s 0% -- $ test.pl Rate regex hash regex 1047449/s -- -1% hash 1059547/s 1% -- $ test.pl Rate hash regex hash 1043950/s -- -1% regex 1054296/s 1% --
Benchmark code...
use Time::HiRes; use Benchmark ':all'; sub regex { my @keep; foreach my $num (@data) { chomp; for ( split //, $num ) { my @a = ($num =~ m/$_/g); if (@a == 2) { push @keep, $num; last; } } } } sub hash { my @keep; # numbers to keep foreach my $line (@data) { chomp; my %count; ++$count{$_} for split '', $line; push @keep, $line if grep { $_ == 2 } values %count } } my @data = <DATA>; cmpthese ( 1_000_000, { hash => 'hash()', regex => 'regex()', });
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^5: Find duplicate digits
by GrandFather (Saint) on Feb 16, 2006 at 03:51 UTC |