in reply to Re: Re: RE performance
in thread RE performance

Thanks!

Isn't it strange that the noCapture case is faster than the noParens??

Here's what use re 'debug' gets me for the 4 regex variants:

Compiling REx `(est|\d)' size 10 first at 3 1: OPEN1(3) 3: BRANCH(6) 4: EXACT <est>(8) 6: BRANCH(8) 7: DIGIT(8) 8: CLOSE1(10) 10: END(0) minlen 1 Compiling REx `(?:est|\d)' size 7 1: BRANCH(4) 2: EXACT <est>(7) 4: BRANCH(6) 5: DIGIT(7) 6: TAIL(7) 7: END(0) minlen 1 Compiling REx `est|\d' size 6 1: BRANCH(4) 2: EXACT <est>(6) 4: BRANCH(6) 5: DIGIT(6) 6: END(0) minlen 1 Compiling REx `est' size 3 first at 1 1: EXACT <est>(3) 3: END(0) anchored `est' at 0 (checking anchored isall) minlen 3 Compiling REx `\d' size 2 first at 1 1: DIGIT(2) 2: END(0) stclass `DIGIT' minlen 1
The TAIL(7) on the noCapture case looks redundant to me; maybe that's something they fixed in 5.8.0?

What happens if you look at those regexes under 5.8.0 with -Dr or use re 'debug', please?

( Edit: Zaxo did this just above, and got the same results, except he gets offsets. No idea what those mean...)
--
Mike

Replies are listed 'Best First'.
Re: Re^3: RE performance
by japhy (Canon) on Sep 06, 2002 at 14:38 UTC
    "TAIL" is the node that defines the end of a non-capturing parenthetical group or an if-then parenthetical group (like (?(1)a|b)). That's about all. It gets optimized out for non-capturing parens if there is no "BRANCH" found (that is, no "|" metacharacter), so that something like /A(?:B)C/ can be optimized to an "EXACT" node matching "ABC".

    Basically, "TAIL" takes the place of having another "indented" layer.

    _____________________________________________________
    Jeff[japhy]Pinyan: Perl, regex, and perl hacker, who'd like a job (NYC-area)
    s++=END;++y(;-P)}y js++=;shajsj<++y(p-q)}?print:??;

Re: Re^3: RE performance
by PodMaster (Abbot) on Sep 05, 2002 at 06:44 UTC
    I think I got a line on that one (i finally read perldebguts), and I quote (cause I likes to ;)
    
        # Do nothing
        NOTHING     no      Match empty string.
        # A variant of above which delimits a group, thus stops optimizations
        TAIL        no      Match empty string. Can jump here from outside.
    
    So i guess this is because qr{} will generate patterns like that, and you normally wouldn't use /(?:foo|\d)/, you'd do something like /$pat|$QRed2/. This is because if perl sees
    4: BRANCH(6)
    5:   DIGIT(6)
    which means there are no patterns after DIGIT , it can optimize (cause the digit in parentheses is the index to the next pattern, or something along those lines), and that's not neccessarily true if you got (?:). I think it's one of those overlooked optimizations (hopefully japhy will answer concretely ~ i'm not a re => god yet ;D).

    No idea what this offsets thing is, I don't get it , but then I don't have perl compiled with the -DDEBUGGING, so it might relate to that somehow.

    ____________________________________________________
    ** The Third rule of perl club is a statement of fact: pod is sexy.