Re^4: eval string possibilities

Thanks for that advice, and I'll follow it up immediately with the script below, which I made after the OP and which is probably indeed bad code, although I think it does give the time discrepancies that I am looking for (qr much faster than eval). I suppose the main thing I missed all along (I had been told things in CB before), is the slowness of eval. Well, now I know :)

#!/usr/bin/perl -w

use strict;
use warnings;

use Benchmark qw /cmpthese/;

cmpthese 10_000 => {
  'do_eval' =>  sub { do_eval() },
  'do_qr'   =>  sub { do_qr() },
};

sub do_eval {
  my $genome  = "AGTATCGATCGATGCATGCTAGCTAGCTAGCTAGCTAGCTAGSTGCTAGCT";
  my @regexes = ('abc', '^qwerty'); # dont care
  my $count   = 0;
  my $code    = 'if ($genome =~ /' . join ('/ && /', @regexes) . '/) {
+ $count++; }';
  eval $code;
  die "Error: $@\n Code:\n$code\n" if ($@);
}

sub do_qr {
  my $string  = "AGTATCGATCGATGCATGCTAGCTAGCTAGCTAGCTAGCTAGSTGCTAGCT";
  my @regexes = ("abc", "^qwerty");  # dont care
  my $count   = 0;
  my @compiled  = map qr/$_/, @regexes;
  for(my $i=0; $i<@regexes; $i++) {
      if($string =~ /$compiled[$i]/){
          $count++;
      }
  }
}

__END__

          Rate do_eval   do_qr
do_eval 2162/s      --    -77%
do_qr   9242/s    328%      --
[download]

Comment on Re^4: eval string possibilities Download Code

Replies are listed 'Best First'.
Re^5: eval string possibilities by revdiablo (Prior) on Nov 23, 2004 at 01:38 UTC
An important part of making benchmark code is to ensure the alternatives do the same thing. Your `do_eval` subroutine had a minor problem (you only had `=~ $genome` on the first match, instead of all of them), which I corrected. But then when I looked at the output from the two subroutines, I noticed it was different: sub do_eval { my $genome = "AGTATCGATCGATGCATGCTAGCTAGCTAGCTAGCTAGCTAGSTGCTAGCT"; my @regexes = ('AGT', 'ATC'); # dont care my $count = 0; my $code = 'if (' . join(' && ', map { "\$genome =~ /$_/" } @regexes) . ') { $count++; }'; eval $code; die "Error: $@\n Code:\n$code\n" if ($@); return $count; # returns 1 } sub do_qr { my $string = "AGTATCGATCGATGCATGCTAGCTAGCTAGCTAGCTAGCTAGSTGCTAGCT"; my @regexes = ("AGT", "ATC"); # dont care my $count = 0; my @compiled = map qr/$_/, @regexes; for(my $i=0; $i<@regexes; $i++) { if($string =~ /$compiled[$i]/){ $count++; } } return $count; # returns 2 } [download] To fix that, I changed your `do_eval` sub to the following: `sub do_eval { my $genome = "AGTATCGATCGATGCATGCTAGCTAGCTAGCTAGCTAGCTAGSTGCTAGCT"; my @regexes = ('AGT', 'ATC'); # dont care my $count = 0; my $code = join ";", map { "\$count++ if \$genome =~ /$_/" } @regexes; eval $code; die "Error: $@\n Code:\n$code\n" if ($@); return $count; # returns 2 }` [download] And I decided to add my own take on the matter, which generates One Big Regex, rather than a bunch of them: `sub do_genre { my $genome = "AGTATCGATCGATGCATGCTAGCTAGCTAGCTAGCTAGCTAGSTGCTAGCT"; my @regexes = ("AGT", "ATC"); # dont care my $regex = join "\|", map "($_)", @regexes; my $count = () = $genome =~ /$regex/; return $count; # returns 2 }` [download] When I run the benchmark, I get the following results: `Rate do_eval do_qr do_genre do_eval 15531/s -- -65% -90% do_qr 44671/s 188% -- -72% do_genre 157893/s 917% 253% --` [download] Which just goes to show that the string eval is slow, but the looping is even slower. A different algorithm makes a big difference.	[reply] [d/l] [select]

Replies are listed 'Best First'.

Re^5: eval string possibilities
by revdiablo (Prior) on Nov 23, 2004 at 01:38 UTC

An important part of making benchmark code is to ensure the alternatives do the same thing. Your do_eval subroutine had a minor problem (you only had =~ $genome on the first match, instead of all of them), which I corrected. But then when I looked at the output from the two subroutines, I noticed it was different:

sub do_eval {
  my $genome  = "AGTATCGATCGATGCATGCTAGCTAGCTAGCTAGCTAGCTAGSTGCTAGCT";
  my @regexes = ('AGT', 'ATC'); # dont care
  my $count   = 0;
  my $code    = 'if ('
              . join(' && ',
                     map { "\$genome =~ /$_/" } @regexes)
              . ') { $count++; }';
  eval $code;
  die "Error: $@\n Code:\n$code\n" if ($@);
  return $count; # returns 1
}

sub do_qr {
  my $string  = "AGTATCGATCGATGCATGCTAGCTAGCTAGCTAGCTAGCTAGSTGCTAGCT";
  my @regexes = ("AGT", "ATC");  # dont care
  my $count   = 0;
  my @compiled  = map qr/$_/, @regexes;
  for(my $i=0; $i<@regexes; $i++) {
      if($string =~ /$compiled[$i]/){
          $count++;
      }
  }
  return $count; # returns 2
}
[download]

To fix that, I changed your do_eval sub to the following:

sub do_eval {
  my $genome  = "AGTATCGATCGATGCATGCTAGCTAGCTAGCTAGCTAGCTAGSTGCTAGCT";
  my @regexes = ('AGT', 'ATC'); # dont care
  my $count   = 0;
  my $code    = join ";",
                map { "\$count++ if \$genome =~ /$_/" } @regexes;
  eval $code;
  die "Error: $@\n Code:\n$code\n" if ($@);
  return $count; # returns 2
}
[download]

And I decided to add my own take on the matter, which generates One Big Regex, rather than a bunch of them:

sub do_genre {
  my $genome  = "AGTATCGATCGATGCATGCTAGCTAGCTAGCTAGCTAGCTAGSTGCTAGCT";
  my @regexes = ("AGT", "ATC");  # dont care
  my $regex   = join "|", map "($_)", @regexes;
  my $count   = () = $genome =~ /$regex/;
  return $count; # returns 2
}
[download]

When I run the benchmark, I get the following results:

             Rate  do_eval    do_qr do_genre
do_eval   15531/s       --     -65%     -90%
do_qr     44671/s     188%       --     -72%
do_genre 157893/s     917%     253%       --
[download]

Which just goes to show that the string eval is slow, but the looping is even slower. A different algorithm makes a big difference.

[reply]
[d/l]
[select]