artist has asked for the wisdom of the Perl Monks concerning the following question:

I like to find all possible unique 'integer squares' from a given number. (starting with 0 doesn't count). Increasing the value of MAX_NUMBER in following code makes program slower. Looking for faster ways to do it.

use strict; my $SQUARE; my $MAX_NUMBER = 1000; foreach my $number (4..$MAX_NUMBER){ my $square = $number * $number; $SQUARE->{$square} = 1; } my $string = q(59738219024971503664); my $total_length; foreach my $square (keys %{$SQUARE}){ if($string =~ /$square/){ my $length = length($square); $total_length += $length; print "$square\t$length\t$total_length\n"; } }
The whole idea is to find the $string of n characters to maximize the $total_length.

Replies are listed 'Best First'.
Re: Finding string for patterns
by Roy Johnson (Monsignor) on Apr 05, 2005 at 21:48 UTC
    use strict; use warnings; my $SQUARE; my $MIN_NUMBER = 4; my $MAX_NUMBER = 1000; my $string = q(59738219024971503664); my $total_length; my %seen; for my $start (0..length $string) { next if substr($string, $start, 1) eq '0'; for my $length (1..(length($string)-$start)) { my $test = substr($string, $start, $length); my $sqrt = sqrt($test); if (!$seen{$test}++ and $sqrt == int $sqrt and $sqrt >= $MIN_NUMBER and $sqrt <= $MAX_NUMBER) { $total_length += $length; print "$test\t$length\t$total_length\n"; } } }

    Caution: Contents may have been coded under pressure.
      That may be fast for short strings, but given longer strings, it is much faster to precompute and then compare.
      use Benchmark qw/:all/; use strict; my $SQUARE; my $MIN_NUMBER = 4; my $MAX_NUMBER = 1000; foreach my $number ($MIN_NUMBER..$MAX_NUMBER){ my $square = $number * $number; $SQUARE->{$square} = $number; } my $string = q(5973821902497150366459738219024971503664); sub m_re { my $total_length; foreach my $square (keys %{$SQUARE}){ if($string =~ /$square/) { my $length = length($square); $total_length += $length; } } return ($total_length); } sub m_index { my $total_length; foreach my $square (keys %{$SQUARE}){ if(index($string, $square) > 0) { my $length = length($square); $total_length += $length; } } return ($total_length); } sub m_sqrt { my $total_length; my %seen; for my $start (0..length $string) { next if substr($string, $start, 1) eq '0'; for my $length (1..(length($string)-$start)) { my $test = substr($string, $start, $length); my $sqrt = sqrt($test); if (!$seen{$test}++ and $sqrt == int $sqrt and $sqrt >= $MIN_NUMBER and $sqrt <= $MAX_NUMBER) + { $total_length += $length; } } } return ($total_length); } cmpthese(1000, { "Regexp", \&m_re, "Index", \&m_index, "Sqrt", \&m_sqrt, });
      Gives:
      Benchmark: timing 1000 iterations of Index, Regexp, Sqrt...
           Index:  1 wallclock secs ( 0.98 usr +  0.01 sys =  0.99 CPU) @ 1010.10/s (n=1000)
          Regexp: 13 wallclock secs (12.20 usr +  0.02 sys = 12.22 CPU) @ 81.83/s (n=1000)
            Sqrt:  2 wallclock secs ( 2.27 usr +  0.01 sys =  2.28 CPU) @ 438.60/s (n=1000)
               Rate Regexp   Sqrt  Index
      Regexp 81.8/s     --   -81%   -92%
      Sqrt    439/s   436%     --   -57%
      Index  1010/s  1134%   130%     --
      
      (update: useless use of sub)
        It's a tradeoff. As the OP noted, if you increase the range of squares you're checking, precomputing gets slower (and it's time you don't count in your benchmark). Stick another 0 on MAX_NUMBER and run your bench again (change the first arg for cmpthese to -10), and you might get something like I did:
        Rate Regexp Index Sqrt Regexp 3.82/s -- -88% -98% Index 32.4/s 748% -- -81% Sqrt 167/s 4281% 417% --
        One relatively easy optimization that I didn't do is to limit the length to what can actually be matched:
        sub m_sqrt { my $total_length; my %seen; my $max_length = length($MAX_NUMBER * $MAX_NUMBER); for my $start (0..length $string) { next if substr($string, $start, 1) eq '0'; my $this_max = length($string)-$start; $this_max = $max_length if $max_length < $this_max; for my $length (1..$this_max) { my $test = substr($string, $start, $length); my $sqrt = sqrt($test); if (!$seen{$test}++ and $sqrt == int $sqrt and $sqrt >= $MIN_NUMBER and $sqrt <= $MAX_NUMBER) + { $total_length += $length; } } } return ($total_length); }
        Now the bench gives me
        Rate Regexp Index Sqrt Regexp 3.64/s -- -88% -99% Index 31.2/s 758% -- -94% Sqrt 497/s 13542% 1489% --

        Caution: Contents may have been coded under pressure.
Re: Finding string for patterns
by tlm (Prior) on Apr 05, 2005 at 21:57 UTC

    OK, right off the bat, when searching for constant strings, use index, not regexps:

    use strict; use Benchmark 'cmpthese'; my $SQUARE; my $MAX_NUMBER = 1000; foreach my $number (4..$MAX_NUMBER) { my $square = $number * $number; $SQUARE->{$square} = 1; } my $string = q(59738219024971503664); cmpthese ( -1, { orig => \&orig, indx => \&windex } ); sub orig { my $total_length = 0; foreach my $square (keys %{$SQUARE}) { if ($string =~ /$square/) { my $length = length($square); $total_length += $length; } } return $total_length; } sub windex { my $string = q(59738219024971503664); my $total_length = 0; foreach my $square (keys %{$SQUARE}) { if ( index( $string, $square ) > -1 ) { my $length = length($square); $total_length += $length; } } return $total_length; } __END__ % perl sqs.pl Rate orig indx orig 100.0/s -- -91% indx 1067/s 967% --
    That's 10x faster with a very simple change.

    the lowliest monk