in reply to Optimization of Regexes

$ perlbrew use 5.8.9 $ perl -Mre=debug -e'qr/a|b|c|d/' Freeing REx: `","' Compiling REx `a|b|c|d' size 13 Got 108 bytes for offset annotations. 1: BRANCH(4) 2: EXACT <a>(13) 4: BRANCH(7) 5: EXACT <b>(13) 7: BRANCH(10) 8: EXACT <c>(13) 10: BRANCH(13) 11: EXACT <d>(13) 13: END(0) minlen 1 Offsets: [13] 0[0] 1[1] 0[0] 2[1] 3[1] 0[0] 4[1] 5[1] 0[0] 6[1] 7[1] 0[0] 8[ +0] Freeing REx: `"a|b|c|d"' $ perlbrew use 5.10.1t $ perl -Mre=debug -e'qr/a|b|c|d/' Compiling REx "a|b|c|d" Final program: 1: TRIEC-EXACT[a-d] (13) <a> <b> <c> <d> 13: END (0) stclass AHOCORASICKC-EXACT[a-d] minlen 1 Freeing REx: "a|b|c|d"

Replies are listed 'Best First'.
Re^2: Optimization of Regexes
by shadowSage (Novice) on Sep 22, 2014 at 13:52 UTC

    I get what your saying. The output of the debug is a bit cryptic to my eyes (I am still new) but clearly a difference between the versions.

    Don't suppose the optimized regex gets stored anywhere accessible? (I don't see anything in perlvar, but you never know)

      but clearly a difference between the versions

      Specifically, a difference in the program into which the regex is compiled. As you mentioned, the difference is that the latter is more efficient to execute. It's slower to compile, though.

      Don't suppose the optimized regex gets stored anywhere accessible?

      What? What is it you want?

        And the difference is that the latter is more efficient to execute, though it's slower to compile

        Yep, I read you loud and clear there. I gather there is some way to override the default (if for some reason I did want to use the pre 5.10 way)? Just idle curiosity.

        What? What is it you want?

        Was just wondering if there was a way to get the optimized regex in a scalar without Regexp::Assemble, like maybe it gets stored in some var somewhere.

        No big deal if it doesn't, I am just playing/experimenting/learning at the moment.

Re^2: Optimization of Regexes
by locked_user sundialsvc4 (Abbot) on Sep 22, 2014 at 14:19 UTC

    Even though the output of the disassembly is cryptic, two things are quite clear:   First, that every regular-expression string is “compiled” into the thing that is actually carried-out by the engine.   Second, that as particular use-cases are identified that can be improved upon, more-efficient pseudoinstructions are devised and the regex-compiler is modified to generate them.

    In the big picture, that ought to be considered, “free sweets.”   Your already-well thought-out algorithm simply runs a few microseconds faster than before.   Generally, disassembly of a complicated regex is most-useful as a way of cofirming that it actually does instruct the computer to do what you meant to say.