Re^2: Find duplicate digits

Wow, that's a lot of excessive work; why use the hash, when a regex can count matches?

my @keep;
while ( chomp(my $num = <DATA>) ) {
   for ( split //, $num ) {
      my @a = ($num =~ m/$_/g);             #count occurances
      if (@a>1) { push @keep, $num; last; } #keep if more than one
   }
}
[download]

<-radiant.matrix->
A collection of thoughts and links from the minds of geeks
The Code that can be seen is not the true Code
I haven't found a problem yet that can't be solved by a well-placed trebuchet

Comment on Re^2: Find duplicate digits Download Code

Replies are listed 'Best First'.
Re^3: Find duplicate digits by friedo (Prior) on Feb 15, 2006 at 16:25 UTC
`:shrug:`, seems like roughly the same amount of work to me. Unless by "work" you mean lines of code, in which case I was being rather more verbose than usual. My method could easily be written: `my @keep; # numbers to keep while(chomp( my $line = <NUMBERS> ) ) { my %count; ++$count{$_} for split '', $line; push @keep, $line if grep { $_ == 2 } values %count }` [download] I also wouldn't be surprised if the regex method was slower, but I'm too lazy to do a benchmark right now. :) (And for only 10,000 numbers, it probably does not matter much.)	[reply] [d/l] [select]
Re^4: Find duplicate digits by radiantmatrix (Parson) on Feb 15, 2006 at 18:46 UTC
You piqued my curiosity. It seems that whether we regex or use a hash, they are basically equivalent. I used a dataset that is all numbers 1000..9999, which is the most fair test I could devise. Please read the reply below. This is what I get for forgetting strict and warnings in my benchmark code. It turns out the hash approach is much faster than the regex (which, honestly, surprised me). `$ test.pl Rate regex hash regex 1048218/s -- -0% hash 1048218/s 0% -- $ test.pl Rate regex hash regex 1047449/s -- -1% hash 1059547/s 1% -- $ test.pl Rate hash regex hash 1043950/s -- -1% regex 1054296/s 1% --` [download] Benchmark code... `use Time::HiRes; use Benchmark ':all'; sub regex { my @keep; foreach my $num (@data) { chomp; for ( split //, $num ) { my @a = ($num =~ m/$_/g); if (@a == 2) { push @keep, $num; last; } } } } sub hash { my @keep; # numbers to keep foreach my $line (@data) { chomp; my %count; ++$count{$_} for split '', $line; push @keep, $line if grep { $_ == 2 } values %count } } my @data = <DATA>; cmpthese ( 1_000_000, { hash => 'hash()', regex => 'regex()', });` [download] <-radiant.matrix-> A collection of thoughts and links from the minds of geeks The Code that can be seen is not the true Code I haven't found a problem yet that can't be solved by a well-placed trebuchet	[reply] [d/l] [select]
Re^5: Find duplicate digits by GrandFather (Saint) on Feb 16, 2006 at 03:51 UTC
Now that is a truly interesting result, especially as when I modified it slightly to add my own variant I got a storm of undefined value used warnings from the `chomp;`s in the foreach loops that don't have anything to chomp on. When I fixed that I obtained the following results. (Note that I've not validated the operation of the individual functions) `Rate regex hash gf regex 1.94/s -- -73% -78% hash 7.17/s 269% -- -17% gf 8.67/s 347% 21% --` [download] Read more... Benchmark code (1029 Bytes) DWIM is Perl's answer to Gödel	[reply] [d/l] [select]