in reply to Re^4: Recognizing 3 and 4 digit number
in thread Recognizing 3 and 4 digit number

Sorry to be so long getting back to you. Events intervene...

I think we both have strong personal styles and we're each unlikely to persuade the other to change any time soon, so this will likely be my last word in this thread.

However, I want to take one more opportunity to state my position clearly. The following rationale is, of course, taken largely (if not entirely!) from TheDamian's regex PBPs.

qr{(?x: ^ ( $re_nonl ) $re_all ( $re_nonl ) $re_all ( $re_nonl ) $ )};

My understanding of your practice is that you might or might not include an m or an s modifier in the opening  (?x: modifier group depending on whether or not  ^ $ or  . were used in the expression, and on what behavior you wanted these operators to exhibit.

When I look at the quoted expression, the first thing I ask is "Ok, what do  ^ and  $ do? How do they behave?" Now I have to go modifier hunting. In this case, there is no  /m modifier in sight, so  ^ $ have their default behaviors.

If I want to change this expression so as to add a  ^ or  $ operator somewhere, I have to repeat the hunt and decide if the operator behavior selected by the existing (or not) m modifier is compatible with the behavior I want. If I'm tempted to add or delete an m modifier, I must look around for other  ^ $ operators already present so that I can be sure the new behavior selected is correct and compatible with pre-existing usages. Room here for bugs to creep in.

But why should I have to ask these questions? Why should these operators have multiple behaviors? If  \A \Z exactly duplicate the default  ^ $ behaviors (with  \z thrown in to extend this functionality a bit), why not just nail down  ^ $ to their enhanced  /m behaviors? No further thought needed.

But what if I don't use any  ^ $ in a given regex? Why should I bother with a useless  /m modifier? If m is always present in a standard /xms tail, no harm is done if no  ^ $ is used in the regex, and if one of these operators is ever added to a regex in which it was not present before, the further step of worrying about whether (and where) to add or not to add the corresponding modifier is totally eliminated.

A similar argument applies to the dot operator: If  [^\n] exactly duplicates the default match behavior of dot, why not set the /s-modified "dot matches all" behavior in cement (especially since the latter behavior is the one most commonly needed in regexes)? Again, the need for thought and the opportunity for confusion are reduced: If no dot appears in a regex, no harm is done; if one must be added later, it's a one-step process.

The end result is my (near) universal use of the  /xms tail in any  qr// m// s/// that I write. As I've said, there are exceptions due to the exigencies of the moment (usually my own want of ingenuity) or to the intricacies of the application, but they're few and far between.

(The  '/flags' mode added to re in Perl version 5.14 seems very convenient for enforcing universal use of an  /xms tail with all regex operators, but I've never used it except for a bit of experimentation. I avoid it because it adds yet another versional boundary to worry about transgressing. Especially with postings to PerlMonks, I move back and forth between pre- and post-5.10 Perl versions so often that I have enough of a headache just with these extensions — but they're too enticing to ignore.)


Give a man a fish:  <%-{-{-{-<

Replies are listed 'Best First'.
Re^6: Recognizing 3 and 4 digit number
by kcott (Archbishop) on Jan 09, 2017 at 01:32 UTC

    All good. That's been a long and interesting discussion. Thankyou.

    — Ken

      Oh, and one more thing (so much for my final word in the thread)...

      I'm sure you must be, but it occurred to me to ask if you were aware that  qr// effectively wraps its object in a  (?mo-ds:...) ((?^mods:...) from 5.14 onward) non-capturing group? So your preferred usage ends up looking like:

      c:\@Work\Perl\monks>perl -wMstrict -le "my $rx = qr{(?x: pattern)}; print qq{perl version $]: $rx}; " perl version 5.008009: (?-xism:(?x: pattern)) perl version 5.014004: (?^:(?x: pattern))


      Give a man a fish:  <%-{-{-{-<

        On 5.24:

        $ perl -E 'my $re = qr{(?x: PAT )}; say $re' (?^u:(?x: PAT ))

        As, ^ is short for d-imnsx, and u overrides d, I suppose that's effectively equivalent to something like:

        (?d-imnsx:(?u-d:(?x: PAT )))

        Also on 5.24:

        $ perl -E 'my $re = qr{ PAT }msx; say $re' (?^umsx: PAT )

        Which, given the same logic, would be:

        (?d-imnsx:(?umsx-d: PAT ))

        Of course, d can't actually be turned off like that (i.e. -d is illegal):

        $ perl -E 'my $re = qr{(?d-imnsx:(?u-d:(?x: PAT )))}' Regexp modifier "d" may not appear after the "-" in regex; ... $ perl -E 'my $re = qr{(?d-imnsx:(?umsx-d: PAT ))}' Regexp modifier "d" may not appear after the "-" in regex; ...

        I'm just making the point that both qr{(?x: PAT )} and qr{ PAT }msx end up with nested (?:...) constructs and both have modifiers turned on and off at various points.

        "... it occurred to me to ask if you were aware that qr// effectively wraps its object ..."

        Yes, I was aware of it; and something I learned a long time ago (2008, I think). I came across a situation where compiled regexes were being stored, retrieved and recompiled. This was under 5.8 (I think), and the modifiers were somewhat different, but the basic scenario was:

        $ perl -E 'my $re = qr{ PAT }msx; say $re; $re = qr{ $re }msx; say $r +e; $re = qr{ $re }msx; say $re; say "..."' (?^umsx: PAT ) (?^umsx: (?^umsx: PAT ) ) (?^umsx: (?^umsx: (?^umsx: PAT ) ) ) ...

        I imagine you're familiar with the doco but, for others who might be interested, see these sections of perlre: "Modifiers"; "(?adlupimnsx-imnsx)"; "(?:pattern)".

        — Ken