in reply to Re: Regex to detect file name
in thread Regex to detect file name
Use of POSIX character classes (see perlre, perlrecharclass) and /x can make regexes easier on the eye:
my $re = qr{ \A [[:alnum:]] (?: [[:alnum:]_.-]* [[:alnum:]])? \z }xms;
is equivalent.
Give a man a fish: <%-{-{-{-<
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^3: Regex to detect file name
by hippo (Archbishop) on Jul 06, 2018 at 20:17 UTC | |
is equivalent No, it really isn't:
| [reply] [d/l] |
by AnomalousMonk (Archbishop) on Jul 06, 2018 at 23:23 UTC | |
Hmmm... Good point. Well, I think in a case like this, I'd still like to try to take advantage of some degree of factoring: (Both the /a modifier and the (?a) embedded modifier seem to work with the original qr{ \A [[:alnum:]] (?: [[:alnum:]_.-]* [[:alnum:]])? \z }xms regex to suppress extended Unicode matching, but I don't fully understand the interaction of this and related flags with POSIX character classes. And it's one more modifier to remember!) Give a man a fish: <%-{-{-{-< | [reply] [d/l] [select] |
|
Re^3: Regex to detect file name
by kcott (Archbishop) on Jul 07, 2018 at 05:14 UTC | |
G'day AnomalousMonk, With regard to the POSIX character class, ++hippo has already pointed out the problem with that. You can certainly be forgiven for that because the documentation appears to be wrong. From "perlrecharclass: POSIX Character Classes":
I rarely use the POSIX classes and wasn't aware of that discrepancy. Anyway, while possibly "easier on the eye", that's likely to result in a fair amount of frustration for someone attempting to perform debugging and assuming the documentation is correct. The problem could be further exacerbated when input characters may not appear to be ones that should be failing. While hippo's example using "LATIN SMALL LETTER C WITH CEDILLA" (ç) was fairly obvious, the glyphs for some characters (depending on the font) may be identical or so similar that it's difficult to tell them apart. Consider "LATIN CAPITAL LETTER A" (A) and "GREEK CAPITAL LETTER ALPHA" (Α):
$ perl -C -E '
use utf8;
say "$_ (", ord $_, "): ", /\A[A-Za-z0-9]\z/ ? "✓" : "✗"
for qw{A Α}
'
A (65): ✓
Α (913): ✗
$ perl -C -E '
use utf8;
say "$_ (", ord $_, "): ", /\A[[:alnum:]]\z/ ? "✓" : "✗"
for qw{A Α}
'
A (65): ✓
Α (913): ✓
As far as the 'x' modifier goes, I don't disagree that it can improve readability; however, where it's felt necessary to use it — either because the regex is particularly complex or it's code that junior developers will need to deal with — spreading the regex across multiple lines and including comments might be even better:
And, with 5.26 or later, perhaps even clearer as:
We've already had exhaustive discussions about the 'm' and 's' modifiers. Use them if you want to follow PBP suggestions but understand that they do absolutely nothing here: there's no '^' or '$' assertions that 'm' might affect; there's no '.' (outside a bracketed character class) that 's' might affect. — Ken | [reply] [d/l] [select] |