in reply to Finding string for patterns

use strict; use warnings; my $SQUARE; my $MIN_NUMBER = 4; my $MAX_NUMBER = 1000; my $string = q(59738219024971503664); my $total_length; my %seen; for my $start (0..length $string) { next if substr($string, $start, 1) eq '0'; for my $length (1..(length($string)-$start)) { my $test = substr($string, $start, $length); my $sqrt = sqrt($test); if (!$seen{$test}++ and $sqrt == int $sqrt and $sqrt >= $MIN_NUMBER and $sqrt <= $MAX_NUMBER) { $total_length += $length; print "$test\t$length\t$total_length\n"; } } }

Caution: Contents may have been coded under pressure.

Replies are listed 'Best First'.
Re^2: Finding string for patterns
by cazz (Pilgrim) on Apr 05, 2005 at 21:58 UTC
    That may be fast for short strings, but given longer strings, it is much faster to precompute and then compare.
    use Benchmark qw/:all/; use strict; my $SQUARE; my $MIN_NUMBER = 4; my $MAX_NUMBER = 1000; foreach my $number ($MIN_NUMBER..$MAX_NUMBER){ my $square = $number * $number; $SQUARE->{$square} = $number; } my $string = q(5973821902497150366459738219024971503664); sub m_re { my $total_length; foreach my $square (keys %{$SQUARE}){ if($string =~ /$square/) { my $length = length($square); $total_length += $length; } } return ($total_length); } sub m_index { my $total_length; foreach my $square (keys %{$SQUARE}){ if(index($string, $square) > 0) { my $length = length($square); $total_length += $length; } } return ($total_length); } sub m_sqrt { my $total_length; my %seen; for my $start (0..length $string) { next if substr($string, $start, 1) eq '0'; for my $length (1..(length($string)-$start)) { my $test = substr($string, $start, $length); my $sqrt = sqrt($test); if (!$seen{$test}++ and $sqrt == int $sqrt and $sqrt >= $MIN_NUMBER and $sqrt <= $MAX_NUMBER) + { $total_length += $length; } } } return ($total_length); } cmpthese(1000, { "Regexp", \&m_re, "Index", \&m_index, "Sqrt", \&m_sqrt, });
    Gives:
    Benchmark: timing 1000 iterations of Index, Regexp, Sqrt...
         Index:  1 wallclock secs ( 0.98 usr +  0.01 sys =  0.99 CPU) @ 1010.10/s (n=1000)
        Regexp: 13 wallclock secs (12.20 usr +  0.02 sys = 12.22 CPU) @ 81.83/s (n=1000)
          Sqrt:  2 wallclock secs ( 2.27 usr +  0.01 sys =  2.28 CPU) @ 438.60/s (n=1000)
             Rate Regexp   Sqrt  Index
    Regexp 81.8/s     --   -81%   -92%
    Sqrt    439/s   436%     --   -57%
    Index  1010/s  1134%   130%     --
    
    (update: useless use of sub)
      It's a tradeoff. As the OP noted, if you increase the range of squares you're checking, precomputing gets slower (and it's time you don't count in your benchmark). Stick another 0 on MAX_NUMBER and run your bench again (change the first arg for cmpthese to -10), and you might get something like I did:
      Rate Regexp Index Sqrt Regexp 3.82/s -- -88% -98% Index 32.4/s 748% -- -81% Sqrt 167/s 4281% 417% --
      One relatively easy optimization that I didn't do is to limit the length to what can actually be matched:
      sub m_sqrt { my $total_length; my %seen; my $max_length = length($MAX_NUMBER * $MAX_NUMBER); for my $start (0..length $string) { next if substr($string, $start, 1) eq '0'; my $this_max = length($string)-$start; $this_max = $max_length if $max_length < $this_max; for my $length (1..$this_max) { my $test = substr($string, $start, $length); my $sqrt = sqrt($test); if (!$seen{$test}++ and $sqrt == int $sqrt and $sqrt >= $MIN_NUMBER and $sqrt <= $MAX_NUMBER) + { $total_length += $length; } } } return ($total_length); }
      Now the bench gives me
      Rate Regexp Index Sqrt Regexp 3.64/s -- -88% -99% Index 31.2/s 758% -- -94% Sqrt 497/s 13542% 1489% --

      Caution: Contents may have been coded under pressure.