http://qs1969.pair.com?node_id=218056


in reply to meaning of /o in regexes

My understanding, which is confirmed by your tests, is that when you use the /o modifier, any variables within the regex will only be interpolated the first time the regex is seen.

This appears to be similar in effect to using the qr// op to create your regexes in advance. However, using qr// has the advantage that you can pre-compile your regexes in sections and then combine them in the m// and s/// operators in different combinations.

A few things I haven't seen an explanation for (they may exist, I just haven't seen them):

  1. Why does the qr// operator accept the /o midifier?
  2. If you combine one or more parts defined with qr// in a regex with some non-precompiled stuff, do you still get the advantage of precompilation?

    Eg.

    my $re_int = qr/[+-]?\d+/; my $re_exp = qr/[Ee][+-]?$re_int/; if ($str =~ m/^(?:$re_int\.)?$re_int$re_exp?$/ ) { print "I think I got a valid int or float"\n"; }
  3. If there was a non-compiled var reference in the above m//, do I still get any benefit from pre-compiling the other parts?
  4. What happens if I add a /o modifier to the m// above?
  5. If one or more of the per-compiled parts (and/or the non-precompiled parts) contains capture brackets, was there any benefit (in performance terms) from pre-compiling some parts?

I did once attempt to systematically benchmark these to try and determine what coptions and combinations of options had greatest benefit from the performance standpoint, but the process is fraught with gotchas.


Okay you lot, get your wings on the left, halos on the right. It's one size fits all, and "No!", you can't have a different color.
Pick up your cloud down the end and "Yes" if you get allocated a grey one they are a bit damp under foot, but someone has to get them.
Get used to the wings fast cos its an 8 hour day...unless the Govenor calls for a cyclone or hurricane, in which case 16 hour shifts are mandatory.
Just be grateful that you arrived just as the tornado season finished. Them buggers are real work.

Replies are listed 'Best First'.
Re^2: meaning of /o in regexes
by diotalevi (Canon) on Dec 06, 2002 at 16:45 UTC

    Just a note: none of this applies if you are using qr the way it's meant - as the entire regex for m// or s/// as in $qr = qr/./; $_ = 'abc'; m/$qr/; s/$qr//; $_ =~ $qr. All of these more normal uses of expressions benefit from the precompilation. This note is about interpolating qr objects into other regular expressions which is different.


    Starting from the top: I created the short sample program and then dumped it's opcode tree to see what it actually does. From this I can say that interpolating qr objects into another regular expression saves nothing. The objects are all concatenated (meaning stringification) and then compiled for the regex. If you add the /o modifier to any m// or s/// operation then it binds the compiled form to that location in hte opcode tree. There is no reason for that to change just because you used a qr in the regex or not. If you read Dominus' remarks on that at Dirty Secrets of the Perl Regex Engine then that will be clear.

    The answers to your questions (in order):

    1. I don't know
    2. no (you are penalized)
    3. no (you are penalized)
    4. the same thing it always does
    5. no (you are penalized)
    The penalizing is from having to do a magic_get on the qr ops instead of just reading it as a string and then the overall penalty of doing work more than once (compile the regex for qr, mg_get the stringified form, then compile the larger regex). Or at least that's how I read it. Please correct me if I'm wrong - I am still quite a novice at this.

    $qr = qr/./; 'a' =~ /$qr$qr/; __DATA__ C:\>perl -MO=Concise qr.pl e <@> leave[t1] vKP/REFC ->(end) 1 <0> enter ->2 2 <;> nextstate(main 5 qr.pl:1) v ->3 5 <2> sassign vKS/2 ->6 3 </> qr(/./) s ->4 - <1> ex-rv2sv sKRM*/1 ->5 4 <> gvsv s ->5 6 <;> nextstate(main 5 qr.pl:3) v ->7 d </> match() vKS ->e 7 <$> const(SPECIAL Null)[t5] s ->8 c <|> regcomp(other->d) sK/1 ->d 8 <1> regcreset sK/1 ->9 >> This is where you see the two [qr] expressions >> being fetched as global scalar values, >> concatenated and *then* just above this the >> regex is compiled. b <2> concat[t4] sK/2 ->c - <1> ex-rv2sv sK/1 ->a 9 <> gvsv s ->a - <1> ex-rv2sv sK/1 ->b a <> gvsv s ->b

    I'm working off of the three references http://perl.plover .com/Rx/, and perlop (the gory quoting part. See also pp_hot.c for pp_concat which doesn't do anything special for qr magic. It's just strings at that point.

    __SIG__ use B; printf "You are here %08x\n", unpack "L!", unpack "P4", pack "L!", B::svref_2object(sub{})->OUTSIDE;

      Thankyou diotalevi++. That is exactly the sort of answer I was looking for and it confirms my suspicions based on some fairly dodgey benchmarking.

      No matter how hard I tried to isolate the benefits of qr//'ing or /o'ing, those benefits always seemed to disappear whenever I attempted to combine one or more pre-compiled regexes with each other or with some non-compiled stuff. In fact, I sometimes detected a penalty from using pre-compiled regexes other than stand-alone, though the differences were too small to quantify with any accuracy.


      Okay you lot, get your wings on the left, halos on the right. It's one size fits all, and "No!", you can't have a different color.
      Pick up your cloud down the end and "Yes" if you get allocated a grey one they are a bit damp under foot, but someone has to get them.
      Get used to the wings fast cos its an 8 hour day...unless the Govenor calls for a cyclone or hurricane, in which case 16 hour shifts are mandatory.
      Just be grateful that you arrived just as the tornado season finished. Them buggers are real work.

        I looked even further and the get magic (see sv.c Perl_sv_2pv and seek to the "Regexp" section) associated with stringifying a qr regex is actually pretty cheap. I'd guess any real performance loss is just from having to compile a regex more than once which unless you are doing some monster regex... isn't all that much of an issue.

        __SIG__ use B; printf "You are here %08x\n", unpack "L!", unpack "P4", pack "L!", B::svref_2object(sub{})->OUTSIDE;
Re: Re: meaning of /o in regexes
by broquaint (Abbot) on Dec 06, 2002 at 16:13 UTC
    1 - Why does the qr// operator accept the /o midifier?
    Just for compatibility reasons, it doesn't actually have any effect on the resulting regex object.
    2- If you combine one or more parts defined with qr// in a regex with some non-precompiled stuff, do you still get the advantage of precompilation?
    In your example, you get the advantage of compilation with the regex objects but the match still has to be compiled dynamically since it contains variables (although with regex objects should be slightly faster than plain strings since they're already compiled).
    3 - If there was a non-compiled var reference in the above m//, do I still get any benefit from pre-compiling the other parts?
    Yup, as the regex is already compiled, where as a plain string has to be compiled first before matching.
    4 - What happens if I add a /o modifier to the m// above?
    I believe the effect would be the same if you were using plain strings as it would compile to the same thing, so the result would be that match regex would only be compiled once and once compiled it's always the same. So you'll get a tiny benefit with the pre-compiled regexes on first compilation, but in the long run it's pretty negligble.
    5 - If one or more of the per-compiled parts (and/or the non-precompiled parts) contains capture brackets, was there any benefit (in performance terms) from pre-compiling some parts?
    There's no reason for capturing to effect the performance of a compiled regex vs a compile'n'do regex, as once compiled a regex will perform the same whether it was compiled or otherwise.
    HTH

    _________
    broquaint

Re: Re: meaning of /o in regexes
by slife (Scribe) on Dec 06, 2002 at 13:37 UTC

    1. Why does the qr// operator accept the /o midifier?

    So that you can create a 'static' compiled regex object that can be interpolated in to more complex patterns subsquently in the program.

    2. If you combine one or more parts defined with qr// in a regex with some non-precompiled stuff, do you still get the advantage of precompilation?

    The discussion on p194 of Camel 3rd Ed. states that you can 'chain' qr// operaters into one pattern to prevent re-compilation, so the answer would appear to be 'no'.

    3. If there was a non-compiled var reference in the above m//, do I still get any benefit from pre-compiling the other parts?

    No; again, according to the reference above, the pattern would be re-compiled.

    4. What happens if I add a /o modifier to the m// above?

    You'd get a once-only compilation of the pattern.

    5. If one or more of the per-compiled parts (and/or the non-precompiled parts) contains capture brackets, was there any benefit (in performance terms) from pre-compiling some parts?

    I doubt that the presence or absence of capture brackets makes much difference to whether or not precompilation provides any benefit.

      Que?

      So patterns compiled with qr// are 'dynamic' unless I use the /o modifier? Could you explain your definition of 'static' in this context? Can you give me a reference to this information?

      2) & 3) - I think I would want considerably more factual information regarding what runtime steps are prevented from repetition by the use of qr// than I can derive from your breif quote, before I could draw any conclusions, never mind your definitive statement.

      4) So, did I benefit, in terms of runtime performance from pre-compiling some parts of the final pattern? Or am I in effect forcing the pre-compiled parts of the regex to be re-inspected? Would it actually be better to simply put all the parts together in a single regex with the /o modifier so that the compiler only needs to process everything one time?

      5) From what source do you derive that conclusion?

      It would make sense to me that if I use qr// or possibly the /o (which I think amount to pretty much the same thing, but am open to correction), that if the regex contains one or more sets of capture brackets, grouping brackets, repetition modifiers etc. It could be possible to pre-build a parsing tree (or somesuch) so that (for example) the size of the @+ and @- arrays could be pre-allocated and pointed to rather than needing to do this at runtime. However, if this was done for 2 seperate patterns each containing a set of capture brackets, when they become combined together, that pre-allocation needs to change.

      Whilst there may be some benefit in combining two pre-parsed regexes together by using whatever data-structures are built internally to represent them, when these are further combined with non-precompiled parts, it might simply be quicker to have the regex engine build the internal data-structure to represent the entire pattern in a single pass rather than parsing the non-compiled parts, having to take into account the effects that the embedded pre-compiled parts have on (for example) capture bracket numbering.

      I would like to know, without needing to resort to source-diving, which of the two approaches is used, and which has the least impact at runtime?


      Okay you lot, get your wings on the left, halos on the right. It's one size fits all, and "No!", you can't have a different color.
      Pick up your cloud down the end and "Yes" if you get allocated a grey one they are a bit damp under foot, but someone has to get them.
      Get used to the wings fast cos its an 8 hour day...unless the Govenor calls for a cyclone or hurricane, in which case 16 hour shifts are mandatory.
      Just be grateful that you arrived just as the tornado season finished. Them buggers are real work.