Update: On second thought, this post is really more like a reply to kcott's Re: Recognizing 3 and 4 digit number and probably should have been posted as such originally. Oh, well...
htmanning: My remarks are further to the careful and detailed remarks of kcott here and, I hope, are in the same spirit.
I certainly agree with the recommendation (and its rationale) of doing development and posing questions to your fellow monks in a Test::More framework.
I tend to differ with kcott in the area of regex best practice. All the following are certainly personal best practices in this area, and are based largely on the regex Perl Best Practices (PBP)s of TheDamian.
kcott implies that one should avoid using the /x /m /s modifiers where they are not necessary. I think they are (almost) always necessary: They clarify intent and make it easier to think about what a regex, that most slippery and counterintuitive of things, is doing. When dealing with regexes, the less you have to think about the better. The result is that almost without exception, every qr// m// s/// operator I write ends up with an /xms tail.
The /x modifier allows comments, saviours of sanity, in regexes. kcott suggests the embedded
qr{(?x: pattern with whitespace )}
usage where comments are needed. This is undesirable IMHO for two reasons: two opportunities for inadvertent literal spaces before and after the (?x: ... ) expression, giving you, e.g.,
qr{ (?x: pattern with whitespace ) }
and potential brain-hurt. The alternate form
qr{(?x) pattern with whitespace }
is better, but still leaves room for a leading literal space to creep in:
qr{ (?x) pattern with whitespace }
Oops. Just write qr{ ... }xms and be done with it.
What if you want literal space characters in your regex when using the /x modifier? I prefer the [ ] usage over the \ usage (which is hard to see and has to be explained: that's a backslash before a | an escaped literal space). A string containing literal spaces can be represented as
qr{ \Qstring with some literal spaces\E }xms
The justification for always using the /m /s modifiers is a bit different: They reduce the "degrees of freedom" of regex behavior.
What does . (dot) match? "Dot matches everything except a newline except where modified by the /s modifier, in which case it matches everything." That's too much to think about. "Dot matches all" is a lot simpler, and that's what you get with the /s modifier, even if you never use a dot operator. What if you actually want to match "everything but a newline"? Use [^\n] in that case; it does the job and perfectly conveys your intention. I have sometimes seen (?-s:.) and (?s:.) used to invoke the different behaviors of dot. Don't. It's just more potential brain-hurt.
Similarly, the behaviors of the ^ $ operators are constrained | expanded by the /m modifier. What if you want only their commonly used end-of-string behaviors? The \A \z \Z operators were invented for this purpose.
With regard to the use of capture groups in qr// operators: This is something else I try assiduously to avoid.
Say you have two Regexp objects $rx $ry with an embedded capture group in each. They might be used in a substitution:
$string =~ s{ foo $rx bar $ry baz }{$1$2}xmsg;
If you change the pattern match to
$string =~ s{ foo $ry bar $rx baz }{$1$2}xmsg;
do you also have to change the order of the capture variables $1 $2 in the replacement string? The problem, of course, is that capture variables correspond in an absolute way to the order of capture groups in the s/// match. The question is highlighted more sharply if the captures appear explicitly in the s/// match:
$string =~ s{ foo ($rx) bar ($ry) baz }{$1$2}xmsg;
to
$string =~ s{ foo ($ry) bar ($rx) baz }{$1$2}xmsg; # switch $1 $2 also?
The \gn relative back-reference extension of Perl release 5.10 eases the problem of capture group numbering somewhat, but capture group variables are still staunchly absolutist! (The (?|alternation|pattern) construct of 5.10 also eases the capture group numbering problem a bit.)
Give a man a fish: <%-{-{-{-<
In reply to Re: Recognizing 3 and 4 digit number
by AnomalousMonk
in thread Recognizing 3 and 4 digit number
by htmanning
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |