http://qs1969.pair.com?node_id=315891

Ovid has asked for the wisdom of the Perl Monks concerning the following question:

I'm researching some performance problems with one of our Web sites and stumbled across some legacy code that's getting pulled into our system. This code uses $` and $' variables with regular expression matches. From the perlre docs (emphasis mine):

WARNING: Once Perl sees that you need one of "$&", "$`", or "$'" anywhere in the program, it has to provide them for every pattern match. This may substantially slow your program.

In fact, in tests that I've run, it appears that the mere existence of these variables will trigger this behavior, regardless of whether or not the code that accesses them will be executed. Removing these variables seems to consistently get me a ten percent bump in regex performance.

Since mod_perl is a persistent Perl interpreter embedded in the Web server, it seems to me that any reference to these variables will impact all regex matches in a mod_perl environment, regardless of whether or not my code actually uses those variables. Is this the case? I don't know enough about benchmarking snippets in a mod_perl environment to easily verify this.

Cheers,
Ovid

New address of my CGI Course.

Replies are listed 'Best First'.
Re: Naughty Regular Expressions and mod_perl
by liz (Monsignor) on Dec 19, 2003 at 20:36 UTC
    Depends on which version of Apache and mod_perl you're using. Personally, I only have experience with the Apache 1.X series and the Apache 2.X with the prefork MPM.

    The way I understand it, is that Apache 1.X and 2.X (prefork MPM) start a Perl interpreter at server load time (well, actually two times, but that doesn't matter here right now). Any modules loaded at server startup time, either through directives in the Apache configuration, or in a PerlRequire file, become part of that interpreter. If any of the special regex variables is seen at that time, then all processes that fork off the initial process (the Apache children that do the actual handling of the requests), will have the regex execution speed penalty.

    If the special regex variables have not been seen, each Apache child starts out with a fast regex. But as soon as any of the children load code that use the special regexes, then that request and all future requests of that child will have the slower regex performance.

    One way to get around that would be to have the child die after handling a request that loaded a module with the special regex characters (see $r->child_terminate). Forking nowadays is pretty fast. On the other hand, the loading of that module might occur often enough to make it worthwhile to keep the child anyway. YMMV.

    Now, with Apache 2 with MPM's other than prefork, I understand there's actually a pool of Perl interpreters each with their own characteristics. So it should be possible to have a Perl interpreter with the magic regex characters enabled in it, and one that doesn't. On the other hand, you seem to need a threaded Perl in those situations (someone please correct me if I'm wrong), which has its own drawbacks execution speed wise.

    I think in conclusion I would have to say: don't use (modules that use) "$&", "$`" or "$'". Or don't worry about the execution speed penalty.

    In general, I would worry less about execution speed in mod_perl, but more about shared memory usage. If your server goes into swap, who cares about slower regexes?

    Hope this helps.

    Liz

Re: Naughty Regular Expressions and mod_perl
by simonm (Vicar) on Dec 19, 2003 at 20:28 UTC
    Since mod_perl is a persistent Perl interpreter embedded in the Web server, it seems to me that any reference to these variables will impact all regex matches in a mod_perl environment, regardless of whether or not my code actually uses those variables. Is this the case?

    Yes, that's what I've been told.

    You could try checking with Devel::SawAmpersand if you wanted to prove it to yourself.

Re: Naughty Regular Expressions and mod_perl
by iburrell (Chaplain) on Dec 19, 2003 at 20:29 UTC
    A quick web search showed that the answer is yes, any usage of the bad variables slows down regex matching for everybody.

    A module, Devel::SawAmpersand, was mentioned that tells if the sawampersand flags was set. And Devel::FindAmpersand tells where it was set.

Re: Naughty Regular Expressions and mod_perl
by davido (Cardinal) on Dec 20, 2003 at 06:31 UTC
    Modern Perl implementations (v5.6.1 and later, I think) provide the @+ and @- arrays that give information about the position of the last match in strings.

    The RegExps, Prematch and Postmatch without efficiency penalty discussion thread provides a solution for using @+ and @- along with either substr or unpack as a means of accomplishing the same thing as $`, $', and $&, but without the performance penalties. I happen to like that node, but I'm biased because I wrote it a few months back. I hope you find it helpful.

    You will also find information on using these special arrays instead of the $`, $', and $& special variables in perlvar.

    Good luck! There is a good workaround.


    Dave

      Hello?

      In your rush to show how much you know about Regexes, you forgot what the original question was.

      And the question was "do I have to suffer for the effects of naughty regexes if somebody else in the same environment is using them?"

      The question WAS NOT "show me how smart you are with @+ and @-"

      Ask around. Everybody in the Monastery believes that Ovid knows a few things about regexes and he wasn't waiting for your self-promoting remarks to come out of the darkness.

        The original question did provide context, including the fact that there was legacy code involved using the match vars. Tips for changing that code without having to substantially change the logic (always a good thing with legacy code) are certainly appropriate.

        I think a disclaimer about authorship is also appropriate; I would call it modesty, not self-promotion.

Re: Naughty Regular Expressions and mod_perl
by oha (Friar) on Dec 19, 2003 at 20:31 UTC
    I recently had a look on how to embed a Perl interpreter. It's quite simple to develop a pool of interpreter and i think it's also usefull to destroy and create someone on idle. I don't know but if this is the case of mod_perl that could partially fix your problem, doesn't it?
Re: Naughty Regular Expressions and mod_perl
by SavannahLion (Pilgrim) on Dec 20, 2003 at 08:07 UTC
    Regex is still a bit of an alien beast to me so forgive me if this sounds really simpleton. According to Camel, Perl uses a similar mechanism to produce $1, $2, and so on, so you also pay a price for each pattern that contains capturing parantheses.

    Is the performance punishment the same as $&, $`, $'. Where once I use a capturing parantheses to produce a $1, $2, etc. Then that performance penalty now exists for all regex's regardless of whether or not I use capturing parantheses because I used capturing parantheses elswhere?

    edit:You know what? My poor addled brain finally put three and two together after I read each word on that page in Camel over and realized that they make whole paragraphs. The performance hit occurs for capturing paranthesis, but it's limited in scope to the specific regex in question. It doesn't incur the same global penalty as $&, $`, $' does. I think I'll go sleep on the floor now.

    ----
    Is it fair to stick a link to my site here?

    Thanks for you patience.

Re: Naughty Regular Expressions and mod_perl
by Juerd (Abbot) on Dec 21, 2003 at 16:39 UTC

    sub killperformance { eval q{ use re qw(debug); $&; }; }

    The re pragma is not lexical :( This is a terrible performance killer, most of the time even worse than $&. And since the output goes to stderr, which is the error log under mod_perl, it can take a while to discover that it's still in effect globally.

    Three very well known and widely used modules that use $& are Parse::RecDescent, Text::Balanced and XML::Twig.

    Juerd # { site => 'juerd.nl', plp_site => 'plp.juerd.nl', do_not_use => 'spamtrap' }