in reply to Re: Finding string for patterns
in thread Finding string for patterns

That may be fast for short strings, but given longer strings, it is much faster to precompute and then compare.
use Benchmark qw/:all/; use strict; my $SQUARE; my $MIN_NUMBER = 4; my $MAX_NUMBER = 1000; foreach my $number ($MIN_NUMBER..$MAX_NUMBER){ my $square = $number * $number; $SQUARE->{$square} = $number; } my $string = q(5973821902497150366459738219024971503664); sub m_re { my $total_length; foreach my $square (keys %{$SQUARE}){ if($string =~ /$square/) { my $length = length($square); $total_length += $length; } } return ($total_length); } sub m_index { my $total_length; foreach my $square (keys %{$SQUARE}){ if(index($string, $square) > 0) { my $length = length($square); $total_length += $length; } } return ($total_length); } sub m_sqrt { my $total_length; my %seen; for my $start (0..length $string) { next if substr($string, $start, 1) eq '0'; for my $length (1..(length($string)-$start)) { my $test = substr($string, $start, $length); my $sqrt = sqrt($test); if (!$seen{$test}++ and $sqrt == int $sqrt and $sqrt >= $MIN_NUMBER and $sqrt <= $MAX_NUMBER) + { $total_length += $length; } } } return ($total_length); } cmpthese(1000, { "Regexp", \&m_re, "Index", \&m_index, "Sqrt", \&m_sqrt, });
Gives:
Benchmark: timing 1000 iterations of Index, Regexp, Sqrt...
     Index:  1 wallclock secs ( 0.98 usr +  0.01 sys =  0.99 CPU) @ 1010.10/s (n=1000)
    Regexp: 13 wallclock secs (12.20 usr +  0.02 sys = 12.22 CPU) @ 81.83/s (n=1000)
      Sqrt:  2 wallclock secs ( 2.27 usr +  0.01 sys =  2.28 CPU) @ 438.60/s (n=1000)
         Rate Regexp   Sqrt  Index
Regexp 81.8/s     --   -81%   -92%
Sqrt    439/s   436%     --   -57%
Index  1010/s  1134%   130%     --
(update: useless use of sub)

Replies are listed 'Best First'.
Re^3: Finding string for patterns
by Roy Johnson (Monsignor) on Apr 05, 2005 at 22:33 UTC
    It's a tradeoff. As the OP noted, if you increase the range of squares you're checking, precomputing gets slower (and it's time you don't count in your benchmark). Stick another 0 on MAX_NUMBER and run your bench again (change the first arg for cmpthese to -10), and you might get something like I did:
    Rate Regexp Index Sqrt Regexp 3.82/s -- -88% -98% Index 32.4/s 748% -- -81% Sqrt 167/s 4281% 417% --
    One relatively easy optimization that I didn't do is to limit the length to what can actually be matched:
    sub m_sqrt { my $total_length; my %seen; my $max_length = length($MAX_NUMBER * $MAX_NUMBER); for my $start (0..length $string) { next if substr($string, $start, 1) eq '0'; my $this_max = length($string)-$start; $this_max = $max_length if $max_length < $this_max; for my $length (1..$this_max) { my $test = substr($string, $start, $length); my $sqrt = sqrt($test); if (!$seen{$test}++ and $sqrt == int $sqrt and $sqrt >= $MIN_NUMBER and $sqrt <= $MAX_NUMBER) + { $total_length += $length; } } } return ($total_length); }
    Now the bench gives me
    Rate Regexp Index Sqrt Regexp 3.64/s -- -88% -99% Index 31.2/s 758% -- -94% Sqrt 497/s 13542% 1489% --

    Caution: Contents may have been coded under pressure.