Unescaped left brace in regex is passed through in regex

gzh has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Unescaped left brace in regex is passed through in regex by haukex (Archbishop) on Jun 06, 2022 at 05:44 UTC
Is there any easy way to maintain compatibility? The best way is to escape the braces. From perldeprecation, reorganized a bit: Forcing literal `{` characters to be escaped will enable the Perl language to be extended in various ways in future releases. ... Literal uses of `{` were deprecated in Perl 5.20, and some uses of it started to give deprecation warnings since. These cases were made fatal in Perl 5.26. Due to an oversight, not all cases of a use of a literal `{` got a deprecation warning. Some cases started warning in Perl 5.26, and were made fatal in Perl 5.30. Other cases started in Perl 5.28, and were made fatal in 5.32. ... The simple rule to remember, if you want to match a literal `{` character (`U+007B LEFT CURLY BRACKET`) in a regular expression pattern, is to escape each literal instance of it in some way. Generally easiest is to precede it with a backslash, like `\{` or enclose it in square brackets (`[{]`). If the pattern delimiters are also braces, any matching right brace (`}`) should also be escaped to avoid confusing the parser, for example, `qr{abc\{def\}ghi}` ... In general, when upgrading Perl versions, two things are important: have a test environment where you can try out upgrading Perl before doing it in a live environment, and second, upgrade Perl releases step by step, i.e. 5.10, 5.12, 5.14, and so on (e.g. perlbrew makes this easy), because as you can tell from the above, the policy is to have breaking changes give deprecation warnings for at least one major release before making them fatal. (Update: Side note: 5.10.1 to 5.26.3 is a jump of over 9 years of development, with 5.26.3 still being 3.5 years older than the current 5.36.0.)	[reply] [d/l] [select]
Re^2: Unescaped left brace in regex is passed through in regex by Aldebaran (Curate) on Jun 11, 2022 at 02:59 UTC
The simple rule to remember, if you want to match a literal { character (U+007B LEFT CURLY BRACKET) in a regular expression pattern, is to escape each literal instance of it in some way. Generally easiest is to precede it with a backslash, like \{ or enclose it in square brackets ({). There's a lot of disambiguation to do along these lines, and your response helped me piece this all together. I finally found the syntax I was looking for in the REPL: `DB<7> $b="\N{U+007B}" + DB<8> p $b + { DB<9>` [download] In general, when upgrading Perl versions, two things are important: have a test environment where you can try out upgrading Perl before doing it in a live environment, and second, upgrade Perl releases step by step, i.e. 5.10, 5.12, 5.14, and so on (e.g. perlbrew makes this easy), because as you can tell from the above, the policy is to have breaking changes give deprecation warnings for at least one major release before making them fatal. (Update: Side note: 5.10.1 to 5.26.3 is a jump of over 9 years of development, with 5.26.3 still being 3.5 years older than the current 5.36.0.) Are you suggesting that a person sandbox the application and boil the perl version up?	[reply] [d/l]
Re^3: Unescaped left brace in regex is passed through in regex by haukex (Archbishop) on Jun 11, 2022 at 15:47 UTC
`$b="\N{U+007B}"` Note that's a double-quoted string, and you don't need to escape `{`'s there - unless of course you mean using that escape in a regex, and while `/\N{U+007B}/` certainly works, it's a whole lot longer than the equivalent `/\{/`... Are you suggesting that a person sandbox the application and boil the perl version up? Yes, that's exactly the suggestion (though "test environment" would be more accurate than "sandbox"). It's of course a bit of work and may seem like overkill; many people do end up not doing this and jumping several years into the future like in this case. But when that then leads to problems (like in this case), one way to debug is to gradually step up the Perl versions so one catches all the deprecation warnings before they become fatal errors or confusing syntax errors.	[reply] [d/l] [select]
Re: Unescaped left brace in regex is passed through in regex by graff (Chancellor) on Jun 06, 2022 at 06:28 UTC
You say ... the application is large and escape is a terrible job. Well, how many separate source code files are there that make up the application? How many of them yield that sort error message when you run `perl -Tcw` on each file? Among the affected files, how many lines of code actually cause that error message? The error messages are "machine-readable" -- they report the line numbers and the affected strings in a consistent way. If it turns out that there are hundreds of lines to fix, you'll probably notice that they fall into a smaller number of patterns, and you can write a perl script to read the error messages and update the affected lines of the affected files -- saving the updated code in a separate directory, of course, so that you can run `diff` on the pre- and post-edited versions to confirm that all and only the intended changes have been made. (If there are just a few dozen lines to fix, stop complaining and just fix them.)	[reply] [d/l] [select]
Re^2: Unescaped left brace in regex is passed through in regex by gzh (Initiate) on Jun 06, 2022 at 07:56 UTC
Dear graff Thank you very much for your suggestions. We found that not every regex with brace has an error. The real range may not be so large. `if ($str =~ /(\\x{[A-F\d]+})/i){ // error by perl -Tcw` `if ($str =~ /^(\d{2,4})[^\d](\d{2})/){ // no error by perl -Tcw`	[reply] [d/l] [select]
Re^3: Unescaped left brace in regex is passed through in regex by LanX (Saint) on Jun 06, 2022 at 16:46 UTC
your code section is hard to decipher in my chrome > `if ($str =~ /(\\x{[A-F\d]+})/i){ // error by perl -Tcw if ($str =~ /^(\d{2,4})[^\d](\d{2})/){ // no error by perl -Tcw` [download] some remarks: you don't append comments with `//` in Perl, it's `#` the no error part is legal syntax for a a range of repetitions the error part is not, hence (old) Perl thinks ° a literal curly `{` is expected. so implicitly `{` is translated to `\{` instead of throwing an error the deprecation means that literal `{` has to be escaped explicitly now Please note the difference: (debugger demo with `perl -de0` ) `DB<4> $str = '\\x{A3f4}' # +literal curly DB<5> if ( $str =~ /(\\x{[A-F\d]+})/i ) { print $1 } # +implicit but deprecated \x{A3f4} DB<6> use warnings; if ( $str =~ /(\\x\{[A-F\d]+})/i ) { print $1 } # +explicit and no warning \x{A3f4} DB<7> $str = '123X12' # +no curlies, just repeated numbers DB<8> if ( $str =~ /^(\d{2,4})[^\d](\d{2})/) { print "$1;$2" } # +meta curly 123;12` [download] Cheers Rolf _{(addicted to the Perl Programming Language :) Wikisyntax for the Monastery} °) see DWIM	[reply] [d/l] [select]
Re^4: Unescaped left brace in regex is passed through in regex by Aldebaran (Curate) on Jun 07, 2022 at 20:25 UTC
Re^5: Unescaped left brace in regex is passed through in regex by AnomalousMonk (Archbishop) on Jun 07, 2022 at 21:24 UTC
Some notes below your chosen depth have not been shown here
Re^5: Unescaped left brace in regex is passed through in regex by LanX (Saint) on Jun 07, 2022 at 23:35 UTC
Re^3: Unescaped left brace in regex is passed through in regex by AnomalousMonk (Archbishop) on Jun 06, 2022 at 14:33 UTC
`if ($str =~ /^(\d{2,4})[^\d](\d{2})/){ // no error by perl -Tcw` Please note that in Perl regex syntax, `\d{2,4}` and `\d{2}` are counted quantifiers and are perfectly valid. This is very ancient and very common syntax, so let's hope it's never deprecated! (Pay no attention to the Raku behind the curtain.) Give a man a fish: `<%-{-{-{-<`	[reply] [d/l] [select]
Re^3: Unescaped left brace in regex is passed through in regex by Anonymous Monk on Jun 06, 2022 at 14:40 UTC
The left curly brace is a meta-character only in certain contexts. Outside those contexts, old versions of Perl would accept them as literals even if they were not escaped. I believe the intent is to eventually require a literal curly braces but not meta-character curly braces to be escaped. In your first example. the left curly is a literal, and needs to be escaped. Looked at another way, the intent of the regex is to match something like `"\x{deadbeef}"`. In your second example, the curly braces are meta-characters specifying numbers of characters to match: two to four digits followed by a non-digit followed by two digits. So no escape required, and in fact escaping the left curly brackets would break the regex functionally, though it would still compile. Maybe the error message is a bit unclear. It does not say "literal left curly," though that is probably implied by "is passed through ..." [OT] the first regexp can probably be written `/(\\x\{[[:xdigit:]]})/`, provided that is really your intent. Note, though, that `\d` matches any digit, not just ASCII digits. That is to say, your regexp will match `"123\N{U+096A}"` (a.k.a. ASCII one, ASCII two, ASCII three, Devanagari digit four), whereas mine will not. If you need to match non-ASCII digits, stay with your own regexp.	[reply] [d/l] [select]
Re^4: Unescaped left brace in regex is passed through in regex by Fletch (Bishop) on Jun 06, 2022 at 16:25 UTC