Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

dear perl monks,

as I try to improve the performance of my little skript and even google can't answer my question, I come here to ask for a bit of your wisdom, since you were always able to help me in the past :)

in my skript I have some regular expressions, which get executed several million times within few minutes. currently they get compiled every time, with this compiling eating several percent of total execution time, so it would be great to precompile them to avoid that compile each and every time.

example:

$string1 =~ s/$string2//g;

where $string1 changes often, but $string2 is used a few million times. after I read some texts in the net, I would replace it by

$string2 = qr/$string2/g; $string1 =~ s/$string2//;

Problem: this does not compile because of the /g option.

after rewriting it to

$string2 = qr/$string2/; $string1 =~ s/$string2//g;

it now does compile, but the profiler shows me, that the precompile doesn't work and the regexp still gets compiled each and everytime :(
is there anything I can do to avoid that compilation and speed up my skript?

thanks ahead!

Replies are listed 'Best First'.
Re: precompiling regular expressions
by ikegami (Patriarch) on Oct 21, 2010 at 18:51 UTC
    • First, s/// doesn't compile the pattern more than once per pass of /g.

      Compiles at most once per call to f, even if it replaces twice:

      sub f { my $pat = $_[0]; (my $str = 'abba') =~ s/$pat//g; }

      Same:

      sub f { my $re = qr/$_[0]/; (my $str = 'abba') =~ s/$re//g; }
    • Now to explain the "at most" bit.

      Compiles twice:

      sub f { my $pat = $_[0]; (my $str = 'abba') =~ s/$pat//g; } f('a') f('b')

      Compiles once (since Perl notices the pattern doesn't change):

      sub f { my $pat = $_[0]; (my $str = 'abba') =~ s/$pat//g; } f('b') f('b') f('b')
    • Finally, how can you avoid excessive recompiling?

      Compiles four times (since the pattern keeps changing):

      sub f { my $pat = $_[0]; (my $str = 'abba') =~ s/$pat//g; } f('a') f('b') f('a') f('b')

      Compiles two times:

      my $re_a = qr/a/; my $re_b = qr/b/; sub f { my $re = $_[0]; (my $str = 'abba') =~ s/$re//g; } f($re_a); f($re_b); f($re_a); f($re_b);

      Compiles once (since Perl notices the pattern doesn't change):

      sub f { my $pat = $_[0]; (my $str = 'abba') =~ s/$pat//g; } f('b') f('b') f('b')

      are you sure about that? basicly I do that, but the profiler claims that CORE:regcomp is called as often as f is executed...

      I am using a rather ancient edition of perl (5.8.6), has this behavivour changed since then?

        are you sure about that?

        Yes.

        >perl -Mre=debug -e"sub f { my $pat = $_[0]; (my $str = 'abba') =~ s/$ +pat//g; } f('b'); f('b'); f('b');" 2>&1 | find /c "Compiling" 1

        Before compiling, it checks if the pattern is the same as the one from the last time the operator was evaluated. If so, it reuses the compiled pattern from the last evaluation.

        I am using a rather ancient edition of perl (5.8.6), has this behavivour changed since then?

        It's not new, but I don't know how old.

Re: precompiling regular expressions
by CountZero (Bishop) on Oct 21, 2010 at 18:02 UTC
    If $string2 does not change during each and every indivudual run of the script, then adding the o (mnemonic: once) modifier will be advisable. The regex will then be compiled only once.

    CountZero

    A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James

      sadly string2 does change after some thousends or millions of uses, so I can't use /o in most cases
Re: precompiling regular expressions
by ambrus (Abbot) on Oct 22, 2010 at 09:03 UTC

    Try doing $re2 = qr/$string2/; after assigning to $string2 but only then, and then do the substitution like $string1 =~ s/$re2//g;.

    Update: also try just doing the $string1 = s/$string//g; substitution without explicitly adding any precompilation, for perl is usually smart enough to figure out what you want to do and will probably not recompile the regular expression if it doesn't change.

Re: precompiling regular expressions
by Anonymous Monk on Oct 21, 2010 at 19:07 UTC
    try:
    $string2 = qr/$string2/; $string1 =~ $string2;
      I need s(ubstitute) and I need the /g option :/