How bad is this idea:
To my understanding, $ in a regex (without the m modifier) is equivalent to (?=\n?\z), i.e. "Match the end of the string (or before newline at the end of the string)". With Unicode, the meaning of "newline" may be extended to "Linebreak", aka \R.
Wouldn't it be nice to make $ behave as (?=\R?\z) under some pragma or flag? (Without \z when the m flag is present, of course.)
I believe this wouldn't even break much existing code. Invented € for the "new" $ here.
#!/usr/bin/perl use v5.14; use warnings; use utf8; use charnames qw(:full :short); use feature 'say'; for ("noeol", "nl\n", "cr\r", "cr_nl\r\n") { my $u_chomped = s/\R//r; say "$u_chomped:"; say 'matches $' if /^\p{word}*$/; say 'matches like $' if /^\p{word}*(?=\n?\z)/; say 'matches €' if /^\p{word}*(?=\R?\z)/; say 'matches \r$' if /^\p{word}*\r$/; say 'matches \r€' if /^\p{word}*\r(?=\R?\z)/; say 'matches \r?$' if /^\p{word}*\r?$/; say 'matches \r?€' if /^\p{word}*\r?(?=\R?\z)/; /^(.*)$/; say 'captured (.*)$' if $1 eq $u_chomped; /^(.*)(?=\R?\z)/; say 'captured (.*)€' if $1 eq $u_chomped; /^(.*).$/; say 'captured (.*).$' if $1 eq $u_chomped; /^(.*).(?=\R?\z)/; say 'captured (.*).€' if $1 eq $u_chomped; say "\n"; } __DATA__ noeol: matches $ matches like $ matches € matches \r?$ matches \r?€ captured (.*)$ captured (.*)€ nl: matches $ matches like $ matches € matches \r?$ matches \r?€ captured (.*)$ captured (.*)€ cr: matches € matches \r$ matches \r€ matches \r?$ matches \r?€ captured (.*).$ captured (.*).€ cr_nl: matches € matches \r$ matches \r€ matches \r?$ matches \r?€ captured (.*).$ captured (.*).€
Greetings,
-jo
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: Making $ Unicode-aware
by jcb (Parson) on Jul 27, 2020 at 02:29 UTC | |
by jo37 (Curate) on Jul 27, 2020 at 06:02 UTC | |
by jcb (Parson) on Jul 28, 2020 at 01:38 UTC | |
by jo37 (Curate) on Jul 28, 2020 at 06:05 UTC | |
by jcb (Parson) on Jul 28, 2020 at 23:40 UTC | |
| |
|
Re: Making $ Unicode-aware
by Anonymous Monk on Jul 27, 2020 at 07:40 UTC | |
|
Re: Making $ Unicode-aware
by Anonymous Monk on Jul 27, 2020 at 21:34 UTC |