in reply to Re^2: Equation - code review
in thread Equation - code review

Before you start looking for areas that you can optimize you should look for areas that should be optimized. You can do this using Benchmark and splitting the function into working pieces, and then testing those pieces.

This line strikes me as possibly inefficient because it creates a new regex for each iteration, which means 1000 regex for a 1000 character sequence.

($left,$middle,$right) = $sequence =~ m/(\w{$_})?(\w{1})?(\w*)/;
So I wrote a benchmark to try a few strategies:
use strict; use warnings; use Benchmark qw(cmpthese); my $iterations = shift @ARGV || 5000; my $sequence_size = shift @ARGV || 10; # Simplistic random sequence my $sequence = join '', map {chr(65 + rand(26))} 1..$sequence_size; cmpthese($iterations,{ re_orig => \&re_orig, re_set => \&re_set, bsubstr => \&bsubstr, }); # Test using the original regex sub re_orig { for my $i (0..length($sequence)-1) { my ($left,$middle,$right) = $sequence =~ m/(\w{$i})?(\w{1})?(\ +w*)/; } } # Test using the character set [A-Z] sub re_set { for my $i (0..length($sequence)-1) { my ($left,$middle,$right) = $sequence =~ m/([A-Z]{$i})([A-Z])( +[A-Z]*)/; } } # Test using substr sub bsubstr { for my $i (0..length($sequence)-1) { my ($left) = substr($sequence,0,$i); my ($middle) = substr($sequence,$i,1); my ($right) = substr($sequence,$i+1); } }
Results for running 500 iterations with length 7:
          Rate re_orig  re_set bsubstr
re_orig  681/s      --    -12%    -92%
re_set   774/s     14%      --    -91%
bsubstr 8475/s   1144%    995%      --
Results for running 500 iterations with length 100:
          Rate re_orig  re_set bsubstr
re_orig 48.1/s      --    -12%    -93%
re_set  54.9/s     14%      --    -92%
bsubstr  718/s   1392%   1209%      --
Results for running 100 iterations with length 1000:
          Rate  re_set bsubstr
re_set  4.00/s      --    -93%
bsubstr 59.5/s   1387%      --