in reply to Impact of special variables on regex match performance

When you use $`, $' or $&, for each match, Perl copies the pre- and postmatch parts of your match. Considering you have a very large string, and you hardly do anything else in your program, the additional copying dominates the runtime.

Considering you aren't using $`, $& and $', it seems the obvious thing to do is to keep not using them.

  • Comment on Re: Impact of special variables on regex match performance

Replies are listed 'Best First'.
Re^2: Impact of special variables on regex match performance
by roubi (Hermit) on Dec 09, 2010 at 20:43 UTC
    Oh that explains the difference of behavior between the code posted above and the multi-line approach I describe at the bottom of my post.
    Considering you aren't using $`, $& and $', it seems the obvious thing to do is to keep not using them
    As per the rules of this forum I posted the smallest amount of code reproducing the issue. Not using those special variables (or rather not have them seen by perl ) isn't an option in the real project...
      Not using those special variables (or rather not have them seen by perl ) isn't an option in the real project
      Of course it is.

      It doesn't mean you can do better. If you are using $` and $' on every match you make, it doesn't matter: any alternative will make you pay the price.

      But if you're using $` and friends on some matches, then /p, @- and @+, and adding more captures in your patterns classical alternatives.

        Yes, I didn't mean to say that there are no technical alternatives to my predicament. The obstacles are more organizational in nature. Thanks for the help.
      I'm sure it's possible to pose a challenge ("real project") in which those variables would be needed... but it's hard to come up with one. Consider:
      $` The text before matching -- In your code, that's $in $' The text which comes after the match in an input string -- Alt: Lo +okarounds $& The text of the match itself -- Use captures instead

      As mentioned, the cost (= slowdown) is well documented in the standard regex docs; and in many books, tutorials and nodes devoted to regular expressions.

      Concerning your code: given that sub uncomment_one is never explicitly called in what you posted, you may have over-reached in your diligence to follow the guidance ( not exactly "rules ) of this forum. /me suspects that profiling what you show would be informative; certainly, as is, the slowdown using your code is largely caused by the copying which is expensive, as JavaFan points out above.

        Concerning your code: given that sub uncomment_one is never explicitly called in what you posted, you may have over-reached in your diligence to follow the guidance ( not exactly "rules ) of this forum.
        I put that code in a sub on purpose. The performance impact is triggered by the mere presence of those variables in the code, not their actual use as part of the execution of the script. And so there is no need to imagine a situation where those variables would be actually needed, only one where you'd need to load a module that contains those variables in a sub your code doesn't actually call.