in reply to Re^5: Unescaped left brace in regex is passed through in regex
in thread Unescaped left brace in regex is passed through in regex

Due to the way the single-quote string constructor handles backslashes (escapes), the \\ will in this case compile to a single literal backslash. See Quote and Quote-like Operators and the discussion of q/STRING/ in Quote-Like Operators.

Thanks for your comment, AM. Your link is a good read and worth reposting. I thought that the collapsing of backslashes was done by the OS in resolving paths. I was unaware that perl did it.

do you dispute that there is a left-curly (and a right-curly) in the \x{A3f4} string? What else would you call it/them?

I do not dispute that, so this string itself never represents a left curly brace, rather it has a left curly brace in it.

I would call it (or in this case \x{A3f4}) "the string compiled from '\\x{A3f4}'"

Ok. From the above source we have:

\x{263A} [1,8] hex char (example shown: SMILEY) \x{ 263A } Same, but shows optional blanks inside and adjoining the braces \x1b [2,8] restricted range hex char (example: ESC)

So, I think "aha, it's a hex representation", but then I can't get there with the REPL:

DB<1> $str2='\\x{263}' + DB<2> p $str2 + \x{263} DB<3> p hex $str2 + 0 DB<4> print hex $str2 + 0

I would expect to see a smiley face rather than zero. This is a head-scratcher:

DB<6> $str3='\\\\\\\x{aF}' + DB<7> p $str3 + \\\\x{aF} DB<8> p hex $str3 + 0 DB<9> print hex $str3 + 0

$str3 goes from 7 to 4 backslashes when compiled(?). But I get zero for a hex value no matter what I try:

DB<10> $str4='\x{aF}' + DB<11> p $str4 + \x{aF} DB<12> print hex $str4 + 0 DB<13> print hex 'aF' + 175

How do I tease 175 out of $str4?

The \x part has nothing to do with the /x or /xx regex modifiers.

That part is clearer now. I have that backslash/forwardslash disphoria going on now where I can hardly see the difference and it looks like a toothpick war. I get the occasional billiken that I read or write the wrong way.

Replies are listed 'Best First'.
Re^7: Unescaped left brace in regex is passed through in regex
by LanX (Saint) on Jun 08, 2022 at 09:28 UTC
    > So, I think "aha, it's a hex representation", but then I can't get there with the REPL:

    you are still confusing interpolation (double-quotes) from literal strings (single-quotes)

    DB<28> p $str1 = "\x{41}" # interpolation A DB<29> p $str2 = '\x{41}' # literal \x{41} DB<30> p $str2 = '\\x{41}' # literal but escaping escaping \ \x{41}

    now, the double escape in line 30 is playing safe, because there is a difference between \\' and \'

    BUT this

    \x{ 263A } Same, but shows optional blanks inside and adjoining the braces

    doesn't work for me! (oO ???)

    DB<31> p " \x{ 41 } " ^@

    Cheers Rolf
    (addicted to the Perl Programming Language :)
    Wikisyntax for the Monastery

Re^7: Unescaped left brace in regex is passed through in regex
by AnomalousMonk (Archbishop) on Jun 08, 2022 at 02:38 UTC

    Some random responses...

    do you dispute that there is a left-curly (and a right-curly) in the \x{A3f4} string? What else would you call it/them?
    I do not dispute that, so this string itself never represents a left curly brace, rather it has a left curly brace in it.

    Oh, so you were thinking that "\x{A3f4}" when compiled double-quotishy into a string and then printed should print a left-curly! I follow you a little better now. My terminal is not configured for Unicode (as I assume this character to be) right now, so I cannot confirm what it will print, and I'm reluctant to launch myself into Unicode-land on-line to find out. However, I agree that the escape sequence \x{A3f4} when compiled double-quotishly (e.g., "ab\x{A3f4}cd") will compile to some character. But the single-quote-compiled string '\x{A3f4}' will always be literally \x{A3f4} and nothing else.

    It's important to understand how backslashes (escapes) are compiled in single- and double-quoted strings. Consider the following:

    Win8 Strawberry 5.8.9.5 (32) Tue 06/07/2022 12:17:53 C:\@Work\Perl\monks >perl use strict; use warnings; print '-\-\\-\\\-\\\\-\\\\\-\\\\\\-\\\\\\\-\\\\\\\\-'; ^Z -\-\-\\-\\-\\\-\\\-\\\\-\\\\-
    Why do '\\\\\\\' and '\\\\\\\\' (7 and 8 backslashes, respectively) both compile to and print as four backslashes? How would this be different if compiled as a double-quoted string?

    DB<1> $str2='\\x{263}'

    This compiles to (and prints) the literal string \x{263} or literal-backslash, literal-lowercase-x, literal-left-curly, literal-2, literal-6, literal-3, literal-right-curly. The hex built-in cannot interpret a string in this format (and so returns zero (update: and a warning)), but can in "proper" format:

    Win8 Strawberry 5.8.9.5 (32) Tue 06/07/2022 22:09:02 C:\@Work\Perl\monks >perl use strict; use warnings; my $h1 = 'A3f4'; my $h2 = 'xA3f4'; print hex 'A3f4', "\n"; print hex $h1, "\n"; print hex 'xA3f4', "\n"; print hex $h2, "\n"; print hex '\xA3f4', "\n"; print hex '\x{A3f4}', "\n"; ^Z 41972 41972 41972 41972 Illegal hexadecimal digit '\' ignored at - line 13. 0 Illegal hexadecimal digit '\' ignored at - line 14. 0

        DB<10> $str4='\x{aF}'
    ...
    How do I tease 175 out of $str4?

    We know that \x{aF} will not be interpreted by hex as a hex number. One way to extract the hex substring:

    Win8 Strawberry 5.8.9.5 (32) Tue 06/07/2022 22:25:09 C:\@Work\Perl\monks >perl use strict; use warnings; my $str = '\x{aF}'; $str =~ m{ \A \\ x \{ ([[:xdigit:]]+) \} \z }xms; my $hex_digits = $1; print ">$hex_digits< \n"; my $hex_number_in_decimal = hex $hex_digits; print "$hex_number_in_decimal \n"; ^Z >aF< 175

    Update: Another approach:

    Win8 Strawberry 5.8.9.5 (32) Sat 06/11/2022 15:18:47 C:\@Work\Perl\monks >perl use strict; use warnings; my $str = '\x{aF}'; my ($hex_digits) = $str =~ m{ [[:xdigit:]]+ }xmsg; my $hex_number_in_decimal = hex $hex_digits; print "'$hex_digits' == $hex_number_in_decimal decimal \n"; ^Z 'aF' == 175 decimal
    This approach can be useful when a string or record has been "validated" as to its structure and you know that certain substrings or fields are unambiguously present: these substrings/fields can then be easily and quickly extracted. Note the /g modifier on the m// match.


    Give a man a fish:  <%-{-{-{-<