Interactions of \Q, \U, \L and \E

ig has asked for the wisdom of the Perl Monks concerning the following question:

Is there documentation of the interactions between \Q, \U, \L and \E escapes in quoted strings?

I have seen Quote and Quote like Operators, but this doesn't describe how these escapes nest that I can see.

$ perl -e 'print "\Qab,cd\Uef,gh\Lij,kl\Emn,op\Eqr,st\Euv,wx\n";'
ab\,cdEF\,GHij\,klmn\,opqr,stuv,wx
[download]

One might expect

ab\,cdEF\,GHij\,klMN\,OPqr\,stuv,wx
[download]

update: or ab\,cdEF\,GHij\,klmn,opqr,stuv,wx.

update: or even ab\,cdEF\,GHIJ\,KLMN\,OPqr\,stuv,wx.

and

$ perl -e 'print "\Uab,cd\Qef,gh\Lij,kl\Emn,op\Eqr,st\Euv,wx\n";'
AB,CDEF\,GHij,klmn,opqr,stuv,wx
[download]

where one might expect

AB,CDEF\,GHij\,klMN\,OPQR,STuv,wx
[download]

Is there a rationale for how they nest and override each other? I find the inconsistency between \Q on the one hand and \U and \L on the other hand to be surprising.

Comment on Interactions of \Q, \U, \L and \E Select or Download Code

Replies are listed 'Best First'.
Re: Interactions of \Q, \U, \L and \E by jethro (Monsignor) on Aug 25, 2009 at 08:33 UTC
\U and \L are not independent from each other, so \L will end \U the same as \E Note that nesting of \U and \L still does what you expect if the \U\E and the \L\E are in different strings (because they are evaluated at different times), i.e. `$a="X\LBLA\EX"; $b="\Un${a}n\El"; print $b; #prints NXBLAXNl` [download] while nesting \U and \L in a single string can always be substituted with a string without nesting UPDATE: corrected the output of the script	[reply] [d/l]
Re: Interactions of \Q, \U, \L and \E by ikegami (Patriarch) on Aug 25, 2009 at 15:29 UTC
One might expect `ab\,cdEF\,GHij\,klMN\,OPqr\,stuv,wx` Not if one followed the documentation. `"\Qab,cd\Uef,gh\Lij,kl\Emn,op\Eqr,st\Euv,wx\n" Q-------------------- * * U------------- L------ * - Ends active transforms. There are none, so no-op.` [download] Now, the documentation doesn't state what happens when both \U and \L are in effect, but Perl does what you'd expect (last one encountered rules). `"\Qab,cd\Uef,gh\Lij,kl\Emn,op\Eqr,st\Euv,wx\n" Q---------------------- * * U------L-------- \| v "ab\\,cdEF\\,GHij\\,klmn,opqr,stuv,wx\n" * - Ends active transforms. There are none, so no-op.` [download] Update: Wait! That's not what Perl gives. Argh, not my day. There's a bug :( (in the docs if not in the interpreter).	[reply] [d/l] [select]
Re^2: Interactions of \Q, \U, \L and \E by ig (Vicar) on Aug 25, 2009 at 15:56 UTC
Wait! That's not what Perl gives. Argh, not my day. There's a bug :( (in the docs if not in the interpreter). What you described (\Q, \U and \L all being terminated at the next \E) is one of the more consistent alternatives to what perl actually does. It would be simpler to implement, document and understand. But, as you note, perl doesn't behave this way. `$ perl -e 'print qq(\Qab,cd\Uef,gh\Lij,kl\Emn,op\Eqr,st\Euv,wx\n)' ab\,cdEF\,GHij\,klmn\,opqr,stuv,wx` [download] There isn't necessarily a bug (i.e. incorrect statement) in the documentation - it may be just an omission. It describes \Q, \U and \L separately. It doesn't say anything about what happens when they are nested or overlapped or concurrent(or however one might choose to describe the situation in the test case above). When used separately, they behave just as the documentation says they do.	[reply] [d/l]
Re^3: Interactions of \Q, \U, \L and \E by ikegami (Patriarch) on Aug 25, 2009 at 17:15 UTC
There isn't necessarily a bug (i.e. incorrect statement) in the documentation - it may be just an omission The docs says the conversion occurs until `\E`, but `perl` keeps going pass `\E` in some circumstances. Call it what you want.	[reply] [d/l] [select]
Re: Interactions of \Q, \U, \L and \E by ikegami (Patriarch) on Aug 25, 2009 at 13:42 UTC
They work at a very low level. In the tokeniser, I believe. ~~That would explain why they fail if you do anything odd with them, such as follow a \Q with a \U~~ `>perl -e"print qq{\Q\U,a}" \,A` [download] ~~(\Q doubled \U's slash without realising it was part of an escape.)~~ In my opinion, they are primarily meant to be used when followed by a variable. Update: Oops, it's actually working right. The "\" is for the comma. I wasn't awake yet :(	[reply] [d/l]
Re^2: Interactions of \Q, \U, \L and \E by ig (Vicar) on Aug 25, 2009 at 15:02 UTC
Yes, they are handled in the tokeniser. It was while reading the tokeniser code that I became confused/curious about how they operate and why. In the case of `qq(\Q\U,a)` this is equivalent (converted to) `quotemeta(uc(',a'))`, as can be seen from the following: Read more... (8 kB) Having further reviewed the tokeniser code, I am reasonably certain how these escapes are handled but still wondering if there is documentation other than that in perlop. I would be particularly interested to understand the cases/situations in which the inconsistencies are necessary or beneficial. I am guessing there are such cases as it would have been significantly simpler to implement, document and understand without the inconsistencies.	[reply] [d/l] [select]