in reply to Re: How to speed up multiple regex in a loop for a big data?
in thread How to speed up multiple regex in a loop for a big data?

Isn't the cost of calling a sub significantly higher than running the same code in the loop? It's always been my policy to eliminate subs were possible when absolute speed is required. (But I could be wrong).
  • Comment on Re^2: How to speed up multiple regex in a loop for a big data?

Replies are listed 'Best First'.
Re^3: How to speed up multiple regex in a loop for a big data?
by salva (Canon) on May 25, 2006 at 16:36 UTC
    well, calling subs in Perl is not as expensive as people usually thing...
    use Benchmark 'cmpthese'; my $a = 'foo bar doz' x 100; $a .= ' hello '.$a; my $sub = sub { /\bhello\b/; /\bhello\b/; /\bhello\b/; /\bhello\b/; /\bhello\b/; /\bhello\b/; /\bhello\b/; /\bhello\b/; }; cmpthese(-3, { loop => sub { for (($a) x 10) { for my $i (1..8) { /\bhello\b/; } } }, sub => sub { for (($a) x 10) { $sub->() } }, inline => sub { for (($a) x 10) { /\bhello\b/; /\bhello\b/; /\bhello\b/; /\bhello\b/; /\bhello\b/; /\bhello\b/; /\bhello\b/; /\bhello\b/; } }
    outputs...
    Rate loop sub inline loop 4157/s -- -5% -16% sub 4363/s 5% -- -12% inline 4943/s 19% 13% --
    and anyway, it's easy to modify my code to remove the subroutine call from the loop just moving the loop inside the sub:
    open(MAP, "<$new_name_map_file"); while (<MAP>) { chomp; tr/A-Z/a-z/; @map_line = split (/\t/); $mapper{$map_line[0]} = $map_line[1]; } close(MAP); my $sub = <<'EOS'; sub { while (<IN>) { print "%"; tr/A-Z/a-z/; EOS for my $name (sort keys %mapper) { my $qname = quotemeta $name; my $qrepl = quotemeta $mapper{$name}; $sub .= "s{\b$qname\b}{$qrepl}g;\n"; } $sub .= <<'EOS' print OUT $_; } } EOS $sub = eval $sub; die if $@; open(IN, "<input_file"); open(OUT, ">input_file.new"); $sub->(); close(IN); clse(OUT);
      I guess I am naive. However, I am not sure how your code can speed up things much. I have a wild thought though. Is there any way I can compile Perl code to make it a bin excutable which could be dramatically faster to run...?
        I am not sure how your code can speed up things much

        well, the bottleneck in the OP code was probably the regular expresion being compiled over and over because he was using a variable ($key) inside. On my code, regexps are just compiled once so it should be much faster.

        You could also use the qr operator to precompile them, but expanding the loop should give some additional speed as you can see from the benchmarks.

        Is there any way I can compile...

        No.

      Thanks. Those benchmarks are food for thought.