in reply to Re^4: How we can separate a backref from a digit?
in thread How we can separate a backref from a digit?

FWIW: I now think that the "proper" way to do this is /(a)\1[1]$ because I seem to recall that circa 5.8.3 or 5.8.4, code was specifically added to the regex engine to cause character classes that contained a single character to be optimised away to that character. This was done because explicitly because it makes the use of the construct as an escaping mechanism so useful and clear.

Ie. /[f][r][e][d]/ becomes exactly equivalent to /fred/, including its runtime performance.

Maybe someone out there has a reference to the change, I looked but could not find it.


Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.

Replies are listed 'Best First'.
Re^6: How we can separate a backref from a digit?
by JavaFan (Canon) on Mar 18, 2011 at 22:54 UTC
    That optimization came with 5.10.

    From man perl5100delta:

    Single char char-classes treated as literals Classes of a single character are now treated the same as i +f the character had been used as a literal, meaning that code tha +t uses char-classes as an escaping mechanism will see a speedup. ( +Yves Orton)
      That optimization came with 5.10.

      That explains why I couldn't find it. I was looking in the wrong place.

      I guess that this is another of those features--like defined-OR--that was discussed for a long time prior to actually making it into a release.

Re^6: How we can separate a backref from a digit?
by ikegami (Patriarch) on Mar 18, 2011 at 23:24 UTC

    I still think it's /...(?:\1).../.

    • In math and Perl, when one wants to group operators, one uses parens. Seems natural to use them here too.
    • Enclosing in parens even has a precedent in Perl: /${prefix}_s/, /(?s:.)/
    • It's self-contained instead of requiring unrelated code to change. This solves /...\1$pat.../.
    • It's has no run-time effect (as shown below).
    $ perl -Mre=debug -E'qr/(a) \1 1/x' Compiling REx "(a) \1 1" Final program: 1: OPEN1 (3) 3: EXACT <a> (5) 5: CLOSE1 (7) 7: REF1 (9) 9: EXACT <1> (11) 11: END (0) anchored "a" at 0 floating "1" at 1..2147483647 (checking floating) mi +nlen 2 Freeing REx: "(a) \1 1" $ perl -Mre=debug -E'qr/(a)(?:\1)1/' Compiling REx "(a)(?:\1)1" Final program: 1: OPEN1 (3) 3: EXACT <a> (5) 5: CLOSE1 (7) 7: REF1 (9) 9: EXACT <1> (11) 11: END (0) anchored "a" at 0 floating "1" at 1..2147483647 (checking floating) mi +nlen 2 Freeing REx: "(a)(?:\1)1"

    Update: Me tired. Grammar bad. Fixed.

      In math and Perl, when one wants to group operators,

      But you aren't "grouping operators". You're wrapping a single operator in grouping parens in order to isolate it from the next operator.

      On the basis that using grouping parens to group a single operator is more confusing than using the character class escaping mechanism which has many benefits in this role, and is therefore a good thing to promote, I prefer the latter. But I wouldn't try and impose that on anyone else.

      That you will disagree with me will come as no surprise to anyone.


      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.

        Sorry, which brackets do you call confusing, the ones around the /\1/ or the one around the /1/?

        I don't agree that

        (4+5)*6

        and

        (ln x) + y

        are confusing.