Re^2: Finding string for patterns

That may be fast for short strings, but given longer strings, it is much faster to precompute and then compare.

use Benchmark qw/:all/;
use strict;
my $SQUARE;
my $MIN_NUMBER = 4;
my $MAX_NUMBER = 1000;
foreach my $number ($MIN_NUMBER..$MAX_NUMBER){
    my $square  = $number * $number;
    $SQUARE->{$square} = $number;
}
my $string = q(5973821902497150366459738219024971503664);

sub m_re {
    my $total_length;
    foreach my $square (keys %{$SQUARE}){
        if($string =~ /$square/) {
            my $length =  length($square);
            $total_length  += $length;
        }
    }
    return ($total_length);
}
sub m_index {
    my $total_length;
    foreach my $square (keys %{$SQUARE}){
        if(index($string, $square) > 0) {
            my $length =  length($square);
            $total_length  += $length;
        }
    }
    return ($total_length);
}

sub m_sqrt {
    my $total_length;
    my %seen;
    for my $start (0..length $string) {
        next if substr($string, $start, 1) eq '0';
        for my $length (1..(length($string)-$start)) {
            my $test = substr($string, $start, $length);
            my $sqrt = sqrt($test);
            if (!$seen{$test}++ and $sqrt == int $sqrt
                    and $sqrt >= $MIN_NUMBER and $sqrt <= $MAX_NUMBER)
+ {
                $total_length += $length;
            }
        }
    }
    return ($total_length);
}

cmpthese(1000, {
    "Regexp", \&m_re,
    "Index", \&m_index,
    "Sqrt", \&m_sqrt,
});
[download]

Gives:

Benchmark: timing 1000 iterations of Index, Regexp, Sqrt...
     Index:  1 wallclock secs ( 0.98 usr +  0.01 sys =  0.99 CPU) @ 1010.10/s (n=1000)
    Regexp: 13 wallclock secs (12.20 usr +  0.02 sys = 12.22 CPU) @ 81.83/s (n=1000)
      Sqrt:  2 wallclock secs ( 2.27 usr +  0.01 sys =  2.28 CPU) @ 438.60/s (n=1000)
         Rate Regexp   Sqrt  Index
Regexp 81.8/s     --   -81%   -92%
Sqrt    439/s   436%     --   -57%
Index  1010/s  1134%   130%     --

(update: useless use of sub)

Comment on Re^2: Finding string for patterns Download Code

Replies are listed 'Best First'.
Re^3: Finding string for patterns by Roy Johnson (Monsignor) on Apr 05, 2005 at 22:33 UTC
It's a tradeoff. As the OP noted, if you increase the range of squares you're checking, precomputing gets slower (and it's time you don't count in your benchmark). Stick another 0 on MAX_NUMBER and run your bench again (change the first arg for cmpthese to -10), and you might get something like I did: `Rate Regexp Index Sqrt Regexp 3.82/s -- -88% -98% Index 32.4/s 748% -- -81% Sqrt 167/s 4281% 417% --` [download] One relatively easy optimization that I didn't do is to limit the length to what can actually be matched: sub m_sqrt { my $total_length; my %seen; my $max_length = length($MAX_NUMBER * $MAX_NUMBER); for my $start (0..length $string) { next if substr($string, $start, 1) eq '0'; my $this_max = length($string)-$start; $this_max = $max_length if $max_length < $this_max; for my $length (1..$this_max) { my $test = substr($string, $start, $length); my $sqrt = sqrt($test); if (!$seen{$test}++ and $sqrt == int $sqrt and $sqrt >= $MIN_NUMBER and $sqrt <= $MAX_NUMBER) + { $total_length += $length; } } } return ($total_length); } [download] Now the bench gives me `Rate Regexp Index Sqrt Regexp 3.64/s -- -88% -99% Index 31.2/s 758% -- -94% Sqrt 497/s 13542% 1489% --` [download] Caution: Contents may have been coded under pressure.	[reply] [d/l] [select]

Replies are listed 'Best First'.

Re^3: Finding string for patterns
by Roy Johnson (Monsignor) on Apr 05, 2005 at 22:33 UTC

         Rate Regexp  Index   Sqrt
Regexp 3.82/s     --   -88%   -98%
Index  32.4/s   748%     --   -81%
Sqrt    167/s  4281%   417%     --
[download]

sub m_sqrt {
    my $total_length;
    my %seen;
    my $max_length = length($MAX_NUMBER * $MAX_NUMBER);
    for my $start (0..length $string) {
        next if substr($string, $start, 1) eq '0';
        my $this_max = length($string)-$start;
        $this_max = $max_length if $max_length < $this_max;
        for my $length (1..$this_max) {
            my $test = substr($string, $start, $length);
            my $sqrt = sqrt($test);
            if (!$seen{$test}++ and $sqrt == int $sqrt
                    and $sqrt >= $MIN_NUMBER and $sqrt <= $MAX_NUMBER)
+ {
                $total_length += $length;
            }
        }
    }
    return ($total_length);
}
[download]

         Rate Regexp  Index   Sqrt
Regexp 3.64/s     --   -88%   -99%
Index  31.2/s   758%     --   -94%
Sqrt    497/s 13542%  1489%     --
[download]

Caution: Contents may have been coded under pressure.

[reply]
[d/l]
[select]