kyle has asked for the wisdom of the Perl Monks concerning the following question:

I sometimes find long-time Perl programmers sprinkle /o over every regular expression they ever write. I just had occasion to see some code written very recently that even had the option applied to qr, which I thought is the new and improved way to do what /o used to do.

I had this hazy (and I now think faulty) memory that /o doesn't even do anything anymore—that perl now figures out whether a pattern needs to be recompiled and skips it anyway. To test this, I tried using Benchmark.

I used Perl 5.10.0.

use Benchmark qw( cmpthese timethese ); my @alphabet = ( 0 .. 9, 'a' .. 'z', 'A' .. 'Z' ); my $h = horrid_rx(1_000); my $qr_h = qr/$h/; my $qr_ho = qr/$h/o; my $s = join q{}, map { $alphabet[ rand @alphabet ] } 1 .. 1_000; my $matchiness = ( $s =~ /$h/ ) ? 'matches' : 'does not match'; print "horrid rx $matchiness string\n"; my $loops = 1_000; cmpthese( -2, { '//' => sub { $s =~ /$h/ for (1..$loops) }, '/o' => sub { $s =~ /$h/o for (1..$loops) }, 'qr' => sub { $s =~ $qr_h for (1..$loops) }, 'qr/o' => sub { $s =~ $qr_ho for (1..$loops) }, } ); sub horrid_rx { my ($n) = @_; my @quant = ( '*', '?', '+', '{0,1}', ); my $out; for ( 1 .. $n ) { $out .= $alphabet[ rand @alphabet ]; $out .= $quant[ rand @quant ]; } return $out; }

Typical results look like this:

horrid rx does not match string Rate // qr/o qr /o // 129/s -- -83% -83% -88% qr/o 752/s 483% -- -0% -27% qr 753/s 483% 0% -- -27% /o 1037/s 704% 38% 38% --

That seems to show that using /o—even on a pattern based on a variable that never changes—does actually help. It even beats having to go through the interface of a Regexp object.

Then on one run, I got this:

horrid rx does not match string Rate // qr qr/o /o // 44.4/s -- -29% -29% -32% qr 62.7/s 41% -- -1% -4% qr/o 63.0/s 42% 1% -- -4% /o 65.4/s 47% 4% 4% --

Now I'm confused. All of these ran significantly slower, so I'm guessing that the pattern and string combination it picked are unusually expensive to fail. In light of that, I'd expect it to somewhat hide the overhead of the compilation, but this seems like more than I'd expect.

So what's going on here? And what of my original question? How useful is it to put /o on a match?

Replies are listed 'Best First'.
Re: How useful is the /o regexp modifier?
by Limbic~Region (Chancellor) on Feb 03, 2009 at 18:29 UTC

      Thank you for the link! Having now read that, I've learned a few things.

      • An expression without /o does indeed not recompile if it hasn't changed, but there's still an overhead of comparing the "new" string to the old string to decide whether to recompile. In my tests, my strings are kind of long (1_000 characters), so this overhead is more apparent.
      • The /o on a qr actually does do something—something just like it does on any other match.
      • A literal qr actually gets compiled at compile time. That shouldn't be a surprise, but I didn't know it.

      Looking at my test results some more, I guess my anomalous case was one where matching took an extra long time, and the overheads got washed out of the results. Since I told Benchmark to run a fixed number of seconds rather than a fixed number of iterations, there were fewer iterations and less overhead resulted. That makes the "compiled once" qr constructs look about the same as the "compiled once" m//o.

        While I admit it's possible to construct a regexp that is executed more than once that is somewhat faster with /o than without, I strongly urge people not to use /o.

        Basically, there are two cases where you may want to use /o:

        1. /...${var}.../o, where you think ${var} doesn't change.
        2. /...${var}.../o, where you think ${var} changes.
        Note that in case 1), leaving off /o is never wrong. However, if you keep /o, and it turns out that $var does actually change between executions (either because you were mistaken about $var changing, or the code was changed and $var now changes), the code is wrong, as it be as if $var retained its old value.

        In case 2), were you want the regexp to act as if $var hasn't change, I'd pity the programmer (even if it's you) who has to maintain that code. It's quite obfuscated.

        So in short, IMO, the performance gain doesn't overcome the drawback of (possible) "action over time" (akin "action at a distance").

        I never use /o, and /o is a red flag in my book.

Re: How useful is the /o regexp modifier?
by moritz (Cardinal) on Feb 03, 2009 at 20:27 UTC
    /o can be rather confusing, so I try to avoid it.

    Your benchmark basically shows that on perl-5.8.8, the qr, qr/o and /o variants have the same run time, and when I run it multiple times, it actually fluctuates enough to call the differences "noise".

    On perl-5.10.0 the /o variant is consistently ~33% faster than qr/ and qr/o.

    But remeber that your cases are pretty pathologic in that the regex is rather unusually long, and the string is shorter than the text representation of the regex. If you increase the length of your strings by a factor of 10, you're again in a regime where the run time differences are smaller than 5% (and this time all of them), so I'd conclude that for any practical purpose the possible small efficiency gain is neglectable.

      your cases are pretty pathologic

      I think that's my problem. What I'd like is a regex that's hard to compile, but what I made instead are ones that are hard to match. Not knowing anything about the compilation process, I don't know how to design something that will take a long time to compile (and match quickly) so as to show the compilation step.

        How about these:

        m{.|<long pattern>} m{\!<long pattern>} # with ! not anywhere in the string you match

        Both will match any long pattern very fast, the first one should be the fastest match possible as it matches just the first char of the string to match.

        It could be that the long patterns are optimized away, that should be tested

        UPDATE: On second thought, whether they are optimized away doesn't really matter as long as the compilation takes more time, and it does, at least with perl 5.8.8

        #with short pattern m{.|U*e?Z+} horrid rx matches string Rate // qr/o qr /o // 1254/s -- -29% -30% -34% qr/o 1771/s 41% -- -1% -7% qr 1781/s 42% 1% -- -7% /o 1912/s 52% 8% 7% -- #with long pattern m{.|U*e?Z+J{0,1}z*4{0,1}7?d*H+k*l+N+d{0,1}4+9{0,1} +... 1000 more subpattern horrid rx matches string Rate // qr qr/o /o // 235/s -- -87% -87% -88% qr 1770/s 654% -- -1% -7% qr/o 1781/s 659% 1% -- -7% /o 1912/s 715% 8% 7% --
Re: How useful is the /o regexp modifier?
by jethro (Monsignor) on Feb 03, 2009 at 20:11 UTC

    I checked a pathological case and with binary search found a slow combination:

    $h='U*e?Z+'; $s='0Yh7rrt1d22BCpt5j62OMASaLznTPG947Ucl9pbtBq7Ab7U26cgrSKUOlKSRLABnre +6nyw4IglTZW';

    Substituting 'U' or 'e' doesn't matter much, but 'Z' is important since it is in the string far at the end. Other values like 'M' would produce somewhat faster results, and 'F' (which is not in the string at all) is really fast again.

    Sadly I don't have a perl with debugging compiled in and don't see why this pattern should be so slow.

    Playing with a fast example showed that eliminating + subpatterns at the start of the pattern got to pathological cases relatively fast. So it is more surprising that the majority of random patterns do NOT exhibit the slow matching