Re^5: eval string possibilities

An important part of making benchmark code is to ensure the alternatives do the same thing. Your do_eval subroutine had a minor problem (you only had =~ $genome on the first match, instead of all of them), which I corrected. But then when I looked at the output from the two subroutines, I noticed it was different:

sub do_eval {
  my $genome  = "AGTATCGATCGATGCATGCTAGCTAGCTAGCTAGCTAGCTAGSTGCTAGCT";
  my @regexes = ('AGT', 'ATC'); # dont care
  my $count   = 0;
  my $code    = 'if ('
              . join(' && ',
                     map { "\$genome =~ /$_/" } @regexes)
              . ') { $count++; }';
  eval $code;
  die "Error: $@\n Code:\n$code\n" if ($@);
  return $count; # returns 1
}

sub do_qr {
  my $string  = "AGTATCGATCGATGCATGCTAGCTAGCTAGCTAGCTAGCTAGSTGCTAGCT";
  my @regexes = ("AGT", "ATC");  # dont care
  my $count   = 0;
  my @compiled  = map qr/$_/, @regexes;
  for(my $i=0; $i<@regexes; $i++) {
      if($string =~ /$compiled[$i]/){
          $count++;
      }
  }
  return $count; # returns 2
}
[download]

To fix that, I changed your do_eval sub to the following:

sub do_eval {
  my $genome  = "AGTATCGATCGATGCATGCTAGCTAGCTAGCTAGCTAGCTAGSTGCTAGCT";
  my @regexes = ('AGT', 'ATC'); # dont care
  my $count   = 0;
  my $code    = join ";",
                map { "\$count++ if \$genome =~ /$_/" } @regexes;
  eval $code;
  die "Error: $@\n Code:\n$code\n" if ($@);
  return $count; # returns 2
}
[download]

And I decided to add my own take on the matter, which generates One Big Regex, rather than a bunch of them:

sub do_genre {
  my $genome  = "AGTATCGATCGATGCATGCTAGCTAGCTAGCTAGCTAGCTAGSTGCTAGCT";
  my @regexes = ("AGT", "ATC");  # dont care
  my $regex   = join "|", map "($_)", @regexes;
  my $count   = () = $genome =~ /$regex/;
  return $count; # returns 2
}
[download]

When I run the benchmark, I get the following results:

             Rate  do_eval    do_qr do_genre
do_eval   15531/s       --     -65%     -90%
do_qr     44671/s     188%       --     -72%
do_genre 157893/s     917%     253%       --
[download]

Which just goes to show that the string eval is slow, but the looping is even slower. A different algorithm makes a big difference.

Comment on Re^5: eval string possibilities Select or Download Code