stabu has asked for the wisdom of the Perl Monks concerning the following question:

I program in C mainly. Recently I have revisiting perllooking at other people's scripts and was starting to notice how common it is to have many (10 or so) of the
$acertainvariable =~ s/regex/replacement/g;
idiom all following each other in the code. The $acertainvariable doesn't change.

is there a performance issue here? I mean, shouldn't an if or switch(or perl equivalent) statement be used? The number of match attempt can clearly be reduced that way.

Now many of these are simple utility scripts, so it mightn't matter ... dunno, it just struck me.

Thank you JavaFan, roboticus and cdarke for your replies. I'm clear about ~= now

  • Comment on SOLVED: performance of repeated "$samevareachtime=~ s/etc/etc" idiom

Replies are listed 'Best First'.
Re: performance of repeated "$samevareachtime=~ s/etc/etc" idiom
by roboticus (Chancellor) on Feb 25, 2010 at 15:15 UTC

    stabu:

    They're not just matches ... they're translations. It's not unlikely for several of them to modify the string.

    my $s = " the quick red fox jumped over the lazy brown dog "; $s =~ s/^\s+//; $s =~ s/\s+$//; $s =~ s/(brown|red)/color/g; print "<$s>\n";

    should give you:

    <the quick color fox jumped over the lazy color dog>

    ...roboticus

Re: performance of repeated "$samevareachtime=~ s/etc/etc" idiom
by JavaFan (Canon) on Feb 25, 2010 at 15:08 UTC
    I wonder how well your understanding of Perl is. Assuming regex matches in $acertainvariable, and replacement is different from what's matched by regex, the line in question will change $acertainvariable.

    I cannot imagine how you intent to replace a sequence of substitutions by an if or switch (the latter is called given in Perl). Perhaps you can enlighten us with an example?

    As for performance, mini optimizations aren't as useful as you may think. Getting your program working correctly is far more important.

Re: performance of repeated "$samevareachtime=~ s/etc/etc" idiom
by cdarke (Prior) on Feb 25, 2010 at 16:28 UTC
    It might be that the number of match attempts could be reduced by using such exotics as look-arounds in the regular expression, but there are surprisingly few people who really understand how to use those. The more complex the RE the more work the engine has to do to interogate it, so several simple REs might not be significantly slower than one complex RE - not to mention the chances of making a mistake which a complex expression brings.

    Also note that the \G anchor might be in use, which indicates that the pattern starts when the previous global match left off.

    A switch statement can be problematic with REs because the target can often match more than one 'case' (this can be an issue with korn shell case statements).