ig has asked for the wisdom of the Perl Monks concerning the following question:

Is there documentation of the interactions between \Q, \U, \L and \E escapes in quoted strings?

I have seen Quote and Quote like Operators, but this doesn't describe how these escapes nest that I can see.

$ perl -e 'print "\Qab,cd\Uef,gh\Lij,kl\Emn,op\Eqr,st\Euv,wx\n";' ab\,cdEF\,GHij\,klmn\,opqr,stuv,wx

One might expect

ab\,cdEF\,GHij\,klMN\,OPqr\,stuv,wx

update: or ab\,cdEF\,GHij\,klmn,opqr,stuv,wx.

update: or even ab\,cdEF\,GHIJ\,KLMN\,OPqr\,stuv,wx.

and

$ perl -e 'print "\Uab,cd\Qef,gh\Lij,kl\Emn,op\Eqr,st\Euv,wx\n";' AB,CDEF\,GHij,klmn,opqr,stuv,wx

where one might expect

AB,CDEF\,GHij\,klMN\,OPQR,STuv,wx

Is there a rationale for how they nest and override each other? I find the inconsistency between \Q on the one hand and \U and \L on the other hand to be surprising.

Replies are listed 'Best First'.
Re: Interactions of \Q, \U, \L and \E
by jethro (Monsignor) on Aug 25, 2009 at 08:33 UTC

    \U and \L are not independent from each other, so \L will end \U the same as \E

    Note that nesting of \U and \L still does what you expect if the \U\E and the \L\E are in different strings (because they are evaluated at different times), i.e.

    $a="X\LBLA\EX"; $b="\Un${a}n\El"; print $b; #prints NXBLAXNl

    while nesting \U and \L in a single string can always be substituted with a string without nesting

    UPDATE: corrected the output of the script

Re: Interactions of \Q, \U, \L and \E
by ikegami (Patriarch) on Aug 25, 2009 at 15:29 UTC

    One might expect ab\,cdEF\,GHij\,klMN\,OPqr\,stuv,wx

    Not if one followed the documentation.
    "\Qab,cd\Uef,gh\Lij,kl\Emn,op\Eqr,st\Euv,wx\n" Q-------------------- * * U------------- L------ * - Ends active transforms. There are none, so no-op.

    Now, the documentation doesn't state what happens when both \U and \L are in effect, but Perl does what you'd expect (last one encountered rules).

    "\Qab,cd\Uef,gh\Lij,kl\Emn,op\Eqr,st\Euv,wx\n" Q---------------------- * * U------L-------- | v "ab\\,cdEF\\,GHij\\,klmn,opqr,stuv,wx\n" * - Ends active transforms. There are none, so no-op.

    Update: Wait! That's not what Perl gives. Argh, not my day. There's a bug :( (in the docs if not in the interpreter).

      Wait! That's not what Perl gives. Argh, not my day. There's a bug :( (in the docs if not in the interpreter).

      What you described (\Q, \U and \L all being terminated at the next \E) is one of the more consistent alternatives to what perl actually does. It would be simpler to implement, document and understand. But, as you note, perl doesn't behave this way.

      $ perl -e 'print qq(\Qab,cd\Uef,gh\Lij,kl\Emn,op\Eqr,st\Euv,wx\n)' ab\,cdEF\,GHij\,klmn\,opqr,stuv,wx

      There isn't necessarily a bug (i.e. incorrect statement) in the documentation - it may be just an omission. It describes \Q, \U and \L separately. It doesn't say anything about what happens when they are nested or overlapped or concurrent(or however one might choose to describe the situation in the test case above). When used separately, they behave just as the documentation says they do.

        There isn't necessarily a bug (i.e. incorrect statement) in the documentation - it may be just an omission

        The docs says the conversion occurs until \E, but perl keeps going pass \E in some circumstances. Call it what you want.

Re: Interactions of \Q, \U, \L and \E
by ikegami (Patriarch) on Aug 25, 2009 at 13:42 UTC
    They work at a very low level. In the tokeniser, I believe. That would explain why they fail if you do anything odd with them, such as follow a \Q with a \U
    >perl -e"print qq{\Q\U,a}" \,A

    (\Q doubled \U's slash without realising it was part of an escape.)

    In my opinion, they are primarily meant to be used when followed by a variable.

    Update: Oops, it's actually working right. The "\" is for the comma. I wasn't awake yet :(

      Yes, they are handled in the tokeniser. It was while reading the tokeniser code that I became confused/curious about how they operate and why.

      In the case of qq(\Q\U,a) this is equivalent (converted to) quotemeta(uc(',a')), as can be seen from the following:

      Having further reviewed the tokeniser code, I am reasonably certain how these escapes are handled but still wondering if there is documentation other than that in perlop. I would be particularly interested to understand the cases/situations in which the inconsistencies are necessary or beneficial. I am guessing there are such cases as it would have been significantly simpler to implement, document and understand without the inconsistencies.