Re: How to match last character of string, even if it happens to be a newline?
by AnomalousMonk (Archbishop) on May 12, 2019 at 14:56 UTC
|
By default, the . (dot) regex metacharacter matches everything except a newline. Use the /s modifier to make dot match everything.
c:\@Work\Perl\monks>perl -wMstrict -le
"use Data::Dump qw(pp);
;;
for my $s (qq{yz}, qq{yz\n}) {
$s =~ m{ (.) \z }xms;
printf qq{in %s matched %s \n}, pp($s), pp($1);
}
"
in "yz" matched "z"
in "yz\n" matched "\n"
See also \z for "absolute end of string" anchor.
Update: You're also running into an interaction with $ which matches | which by default matches at "the end of the line (or before newline at the end)", so even with the /s modifier, the first position at which .$ can possibly match (scanning from left to right) is before a newline, if present; remember that the matching rule is leftmost longest.
Give a man a fish: <%-{-{-{-<
| [reply] [d/l] [select] |
|
|
Yes, the \z anchor does, the trick, thanks!
| [reply] |
|
|
Using \z, I don't seem to need multiline, simply m@(.)\z@s
| [reply] [d/l] |
|
|
Using \z, I don't seem to need multiline ...
That's because \z is always the absolute end-of-string anchor; no modifiers apply. I always use \A \z \Z because they have invariant behavior. For the same reason, I nail down the ^ $ operators by always using the /m modifier. (I then use the ^ $ operators only with newlines embedded within a string.)
Give a man a fish: <%-{-{-{-<
| [reply] [d/l] [select] |
|
|
er... I _did_ use s modifier :-/
| [reply] |
|
|
| [reply] [d/l] [select] |
Re: How to match last character of string, even if it happens to be a newline?
by jwkrahn (Abbot) on May 12, 2019 at 18:48 UTC
|
$ perl -e'use Data::Dumper; $Data::Dumper::Useqq = 1; my $text_1 = "a\
+nb\nc\n"; print Dumper $text_1, "Last character: " . substr $text_1,
+-1'
$VAR1 = "a\nb\nc\n";
$VAR2 = "Last character: \n";
| [reply] [d/l] |
|
|
| [reply] |
|
|
| [reply] |
Re: How to match last character of string, even if it happens to be a newline?
by AnomalousMonk (Archbishop) on May 12, 2019 at 15:27 UTC
|
My regex Best Practices (lifted whole from TheDamian's Perl Best Practices — highly recommended in general) include using an /xms modifier tail on every qr// m// s/// I write. This reduces the degrees of freedom of the ^ $ . operators and clarifies their function, at least for me. Coupled with the use of \A \z \Z as string start/end anchors, I find I can think a bit more clearly about the highly counterintuitive operation of regular expressions.
Give a man a fish: <%-{-{-{-<
| [reply] [d/l] [select] |
|
|
This reduces the degrees of freedom of the ^ $ . operators
Maybe that hints at the following inconsistency I see with the "the end of the line (or before newline at the end)" rule?
Example
print('String: a\nb\nc\n' . "\n");
$text_1 = "a\nb\nc\n";
$text_1 =~ s@(\n)$@@s;
print("----------\n>" . $1 . "<\n");
print("----------\n>" . $text_1 . "<\n");
Gives:
String: a\nb\nc\n
----------
>
<
----------
>a
b
c<
In this case, the $ behaved like \z. Or another way to say it, in this case explicit \n matches where dot with s mod doesn't. | [reply] [d/l] [select] |
|
|
c:\@Work\Perl\monks>perl -wMstrict -le
"use Data::Dump qw(pp);
;;
for my $s (qq{yz}, qq{yz\n}, qq{\n}) {
$s =~ m{ (.) $ }xms;
printf qq{in %s matched %s \n}, pp($s), pp($1);
}
"
in "yz" matched "z"
in "yz\n" matched "z"
in "\n" matched "\n"
(The /m modifier makes no difference in these example strings.)
The thing to remember about regular expressions is that there are a lot of things to remember about regular expressions. If you have a chance to reduce the amount of stuff to remember, even if only by a little, take it. That's why I advise (per TheDamian's regex PBPs) using \A \z \Z for all your start- and end-of-string anchoring needs, and using ^ $ only for embedded newline matching.
... inconsistency ...
For me, it's not so much inconsistency as mind-boggling complexity. And again, I come back to the point that if you can reduce the complexity of what you're dealing with even a little, you're ahead of the game.
Give a man a fish: <%-{-{-{-<
| [reply] [d/l] [select] |
Re: How to match last character of string, even if it happens to be a newline?
by LanX (Saint) on May 12, 2019 at 15:27 UTC
|
$ perl -e' m/.*(.|\n)/,print "<$1>" for "123","ab\
c\n"'
<3><
>$
Please note that \n is often not one but two characters, like on Unix CR LF
| [reply] [d/l] [select] |
|
|
Mac or Windows newlines seldom cause a problem. I think of \n as a perl newline. Perl strings always use it. Translation between it and your OS's representation is done by an I/O "layer". (In Unix, the "translation" does not actually change anything.) The only exception is when we change I/O behavior by specifying non-standard layers or binmode on input.
| [reply] |
|
|
| [reply] |
|
|
|
|
|
|
|
$text_1 = "abc\nd";
$text_1 =~ m/.*(.|\n)/;
print("----------\n>" . $1 . "<\n");
Prints:
----------
>
<
Should print d | [reply] [d/l] [select] |
|
|
$text_1 = "abc\nd";
$text_1 =~ m/.*(.|\n)/;
...
Should print d
A narration of m/.*(.|\n)/ might be:
-
.* From the start of the string, grab as much as possible of anything that's not a newline (no /s modifier for dot);
-
(.|\n) Then match and capture the first thing that's either not-a-newline or a newline.
Looked at this way, the only thing that could possibly be captured in the given string would be a newline.
Indeed, if your regex has no operator introduced after Perl version 5.6, this kind of narration is what YAPE::Regex::Explain will give you:
c:\@Work\Perl\monks>perl -wMstrict -le
"use YAPE::Regex::Explain;
print YAPE::Regex::Explain->new(qr/.*(.|\n)/)->explain();
"
The regular expression:
(?-imsx:.*(.|\n))
matches as follows:
NODE EXPLANATION
----------------------------------------------------------------------
(?-imsx: group, but do not capture (case-sensitive)
(with ^ and $ matching normally) (with . not
matching \n) (matching whitespace and #
normally):
----------------------------------------------------------------------
.* any character except \n (0 or more times
(matching the most amount possible))
----------------------------------------------------------------------
( group and capture to \1:
----------------------------------------------------------------------
. any character except \n
----------------------------------------------------------------------
| OR
----------------------------------------------------------------------
\n '\n' (newline)
----------------------------------------------------------------------
) end of \1
----------------------------------------------------------------------
) end of grouping
----------------------------------------------------------------------
(There are newer and better regex parser/explainers around, but I like this one, limited as it is, for its explanatory style.)
Give a man a fish: <%-{-{-{-<
| [reply] [d/l] [select] |
|
|
DB<11> m/.*(.|\n)/s,print "<$1>" for "123","abc\n","abc\nd"
<3><
><d>
DB<12> m/.*(.)/s,print "<$1>" for "123","abc\n","abc\nd"
<3><
><d>
DB<13>
HTH!
| [reply] [d/l] [select] |