Finding string for patterns

artist has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Finding string for patterns by Roy Johnson (Monsignor) on Apr 05, 2005 at 21:48 UTC
use strict; use warnings; my $SQUARE; my $MIN_NUMBER = 4; my $MAX_NUMBER = 1000; my $string = q(59738219024971503664); my $total_length; my %seen; for my $start (0..length $string) { next if substr($string, $start, 1) eq '0'; for my $length (1..(length($string)-$start)) { my $test = substr($string, $start, $length); my $sqrt = sqrt($test); if (!$seen{$test}++ and $sqrt == int $sqrt and $sqrt >= $MIN_NUMBER and $sqrt <= $MAX_NUMBER) { $total_length += $length; print "$test\t$length\t$total_length\n"; } } } [download] Caution: Contents may have been coded under pressure.	[reply] [d/l]
Re^2: Finding string for patterns by cazz (Pilgrim) on Apr 05, 2005 at 21:58 UTC
That may be fast for short strings, but given longer strings, it is much faster to precompute and then compare. use Benchmark qw/:all/; use strict; my $SQUARE; my $MIN_NUMBER = 4; my $MAX_NUMBER = 1000; foreach my $number ($MIN_NUMBER..$MAX_NUMBER){ my $square = $number * $number; $SQUARE->{$square} = $number; } my $string = q(5973821902497150366459738219024971503664); sub m_re { my $total_length; foreach my $square (keys %{$SQUARE}){ if($string =~ /$square/) { my $length = length($square); $total_length += $length; } } return ($total_length); } sub m_index { my $total_length; foreach my $square (keys %{$SQUARE}){ if(index($string, $square) > 0) { my $length = length($square); $total_length += $length; } } return ($total_length); } sub m_sqrt { my $total_length; my %seen; for my $start (0..length $string) { next if substr($string, $start, 1) eq '0'; for my $length (1..(length($string)-$start)) { my $test = substr($string, $start, $length); my $sqrt = sqrt($test); if (!$seen{$test}++ and $sqrt == int $sqrt and $sqrt >= $MIN_NUMBER and $sqrt <= $MAX_NUMBER) + { $total_length += $length; } } } return ($total_length); } cmpthese(1000, { "Regexp", \&m_re, "Index", \&m_index, "Sqrt", \&m_sqrt, }); [download] Gives: Benchmark: timing 1000 iterations of Index, Regexp, Sqrt... Index: 1 wallclock secs ( 0.98 usr + 0.01 sys = 0.99 CPU) @ 1010.10/s (n=1000) Regexp: 13 wallclock secs (12.20 usr + 0.02 sys = 12.22 CPU) @ 81.83/s (n=1000) Sqrt: 2 wallclock secs ( 2.27 usr + 0.01 sys = 2.28 CPU) @ 438.60/s (n=1000) Rate Regexp Sqrt Index Regexp 81.8/s -- -81% -92% Sqrt 439/s 436% -- -57% Index 1010/s 1134% 130% -- (update: useless use of sub)	[reply] [d/l]
Re^3: Finding string for patterns by Roy Johnson (Monsignor) on Apr 05, 2005 at 22:33 UTC
It's a tradeoff. As the OP noted, if you increase the range of squares you're checking, precomputing gets slower (and it's time you don't count in your benchmark). Stick another 0 on MAX_NUMBER and run your bench again (change the first arg for cmpthese to -10), and you might get something like I did: `Rate Regexp Index Sqrt Regexp 3.82/s -- -88% -98% Index 32.4/s 748% -- -81% Sqrt 167/s 4281% 417% --` [download] One relatively easy optimization that I didn't do is to limit the length to what can actually be matched: sub m_sqrt { my $total_length; my %seen; my $max_length = length($MAX_NUMBER * $MAX_NUMBER); for my $start (0..length $string) { next if substr($string, $start, 1) eq '0'; my $this_max = length($string)-$start; $this_max = $max_length if $max_length < $this_max; for my $length (1..$this_max) { my $test = substr($string, $start, $length); my $sqrt = sqrt($test); if (!$seen{$test}++ and $sqrt == int $sqrt and $sqrt >= $MIN_NUMBER and $sqrt <= $MAX_NUMBER) + { $total_length += $length; } } } return ($total_length); } [download] Now the bench gives me `Rate Regexp Index Sqrt Regexp 3.64/s -- -88% -99% Index 31.2/s 758% -- -94% Sqrt 497/s 13542% 1489% --` [download] Caution: Contents may have been coded under pressure.	[reply] [d/l] [select]
Re: Finding string for patterns by tlm (Prior) on Apr 05, 2005 at 21:57 UTC
OK, right off the bat, when searching for constant strings, use `index`, not regexps: use strict; use Benchmark 'cmpthese'; my $SQUARE; my $MAX_NUMBER = 1000; foreach my $number (4..$MAX_NUMBER) { my $square = $number * $number; $SQUARE->{$square} = 1; } my $string = q(59738219024971503664); cmpthese ( -1, { orig => \&orig, indx => \&windex } ); sub orig { my $total_length = 0; foreach my $square (keys %{$SQUARE}) { if ($string =~ /$square/) { my $length = length($square); $total_length += $length; } } return $total_length; } sub windex { my $string = q(59738219024971503664); my $total_length = 0; foreach my $square (keys %{$SQUARE}) { if ( index( $string, $square ) > -1 ) { my $length = length($square); $total_length += $length; } } return $total_length; } __END__ % perl sqs.pl Rate orig indx orig 100.0/s -- -91% indx 1067/s 967% -- [download] That's 10x faster with a very simple change. the lowliest monk	[reply] [d/l] [select]