in reply to Re^2: Recognizing 3 and 4 digit number
in thread Recognizing 3 and 4 digit number

We definitely seem to be at odds with /m and /s. ... I rarely need those: sometimes I need one of them; I need both far less often.

My motive for always using the  /ms modifier cluster (in addition to /x, of course) is to foster clarity, and clarity is always a necessity :) Clarity is improved because the  . ^ $ operators have unvarying behaviors. Sometimes one is forced to be devious and must sacrifice clarity of expression, but that's what comments are for!

... the qr{(?mods:...)} form over the qr{...}mods form ... The latter form makes the modifiers global: you can't get finer control such as qr{(?mo:...)(?ds:...)} or qr{(?mo:...(?ds:...)...)}.

The docs say this finer control is possible:  (?mo-ds) and  (?mo-ds:pattern) are rigorously scoped:

c:\@Work\Perl\monks>perl -wMstrict -le "my $s = qq{aa \n bb \n cc}; ;; print qq{A: match, \$1 '$1' @ $-[1] \$2 '$2' @ $-[2] \$3 '$3' @ $-[ +3]} if $s =~ m{ \A ((?-s: .+)) .+ ((?-s: .+)) .+ ((?-s: .+)) \z }xms; ;; print qq{B: match, \$1 '$1' @ $-[1]} if $s =~ m{ \A ((?-s: .+)) \z }xms; ;; print qq{C: match, \$1 '$1' @ $-[1]} if $s =~ m{ \A ((?-s: (?s: .+))) \z }xms; " A: match, $1 'aa ' @ 0 $2 ' ' @ 9 $3 'c' @ 11 C: match, $1 'aa bb cc' @ 0
(Tricky to put together a meaningful example for this!)

That said, I would never write regex A as above, but rather as:

c:\@Work\Perl\monks>perl -wMstrict -le "my $s = qq{aa \n bb \n cc}; ;; print qq{A: match, \$1 '$1' @ $-[1] \$2 '$2' @ $-[2] \$3 '$3' @ $-[ +3]} if $s =~ m{ \A ([^\n]+) .+ ([^\n]+) .+ ([^\n]+) \z }xms; " A: match, $1 'aa ' @ 0 $2 ' ' @ 9 $3 'c' @ 11
Don't mess with dot (or  ^ $ either): much less potential for brain-hurt.

Update: Another version of regex A:

c:\@Work\Perl\monks>perl -wMstrict -le "my $s = qq{aa \n bb \n cc}; ;; print qq{A: match, \$1 '$1' @ $-[1] \$2 '$2' @ $-[2] \$3 '$3' @ $-[ +3]} if $s =~ m{ \A (?-s) (.+) (?s) .+ (?-s) (.+) (?s) .+ (?-s) (.+) \z }xms; " A: match, $1 'aa ' @ 0 $2 ' ' @ 9 $3 'c' @ 11
In the context of global dot-matches-newline behavior, successive  (?-s) and  (?s) turn newline matching off and on, respectively. Again, I wouldn't actually write a regex this way unless my feet were being held to the fire.


Give a man a fish:  <%-{-{-{-<

Replies are listed 'Best First'.
Re^4: Recognizing 3 and 4 digit number
by kcott (Archbishop) on Jan 04, 2017 at 00:36 UTC
    "... clarity is always a necessity ..."

    You'll get no argument from me on that one.

    "My motive for always using the  /ms modifier cluster (in addition to /x, of course) is to foster clarity, ..."

    I think we both agree about /x: no need to discuss that any further. However, I still disagree about /ms. Compare these three lines from your code (above):

    \A ((?-s: .+)) .+ ((?-s: .+)) .+ ((?-s: .+)) \z \A ((?-s: .+)) \z \A ((?-s: (?s: .+))) \z

    with the eqivalent lines from my code (below):

    ^ ( $re_nonl ) $re_all ( $re_nonl ) $re_all ( $re_nonl ) $ ^ ( $re_nonl ) $ ^ ( $re_all ) $

    Your regexes all use \ms and then need (?s and (?-s in various places. My regexes don't use \ms or (?ms at all; $re_nonl and $re_all are tiny regexes, $re_nonl doesn't use \ms or (?ms at all, $re_all only uses (?s.

    "The docs say this finer control is possible: ..."

    I wondered if you thought I was suggesting that level of control was not possible. If so, my apologies: that wasn't my intent. Perhaps I should have compared

    qr{(?mo:...)(?ds:...)} qr{(?mo:...(?ds:...)...)}

    with

    qr{(?mo-ds:...)(?ds-mo:...)}mods qr{(?mo-ds:...(?ds-mo:...)...)}mods

    In the code below, I've used the regexes described above ($reA, $reB & $reC). Those were written on the assumption that all were needed in the same script. I've also added $reAiso, $reBiso & $reCiso to show how I might have written these in isolation: none use the 'm' modifier; two use the 's' modifier. Finally, I added $reD as an example of when I might use both the 'm' and 's' modifiers. Throughout, I've used the input and output formats that you used in your code.

    #!/usr/bin/env perl -l use strict; use warnings; use Test::More tests => 7; my $expA = "A: match, \$1 'aa ' @ 0 \$2 ' ' @ 9 \$3 'c' @ 11"; my $expB = ''; my $expC = "C: match, \$1 'aa \n bb \n cc' @ 0"; my $expD = "D: match, \$1 'aa ' @ 0 \$2 ' cc' @ 9"; my $fmtA = "A: match, \$1 '%s' @ %d \$2 '%s' @ %d \$3 '%s' @ %d"; my $fmtB = "B: match, \$1 '%s' @ %d"; my $fmtC = "C: match, \$1 '%s' @ %d"; my $fmtD = "D: match, \$1 '%s' @ %d \$2 '%s' @ %d"; my $s = "aa \n bb \n cc"; my $re_all = qr{(?sx: .+ )}; my $re_nonl = qr{(?x: [^\n]+ )}; my $reA = qr{(?x: ^ ( $re_nonl ) $re_all ( $re_nonl ) $re_all ( $re_no +nl ) $ )}; my $reB = qr{(?x: ^ ( $re_nonl ) $ )}; my $reC = qr{(?x: ^ ( $re_all ) $ )}; my $reAiso = qr{(?sx: ^ ([^\n]+) .+ ([^\n]+) .+ ([^\n]+) $ )}; my $reBiso = qr{(?x: ^ ( .+ ) $ )}; my $reCiso = qr{(?sx: ^ ( .+ ) $ )}; my $reD = qr{(?msx: \A ( .+? ) $ .+ ^ ( .+ ) \z )}; my ($gotA, $gotB, $gotC, $gotAiso, $gotBiso, $gotCiso, $gotD) = ('') x + 7; $gotA = sprintf $fmtA, $1, $-[1], $2, $-[2], $3, $-[3] if $s =~ $reA; $gotB = sprintf $fmtB, $1, $-[1] if $s =~ $reB; $gotC = sprintf $fmtC, $1, $-[1] if $s =~ $reC; $gotAiso = sprintf $fmtA, $1, $-[1], $2, $-[2], $3, $-[3] if $s =~ $re +Aiso; $gotBiso = sprintf $fmtB, $1, $-[1] if $s =~ $reBiso; $gotCiso = sprintf $fmtC, $1, $-[1] if $s =~ $reCiso; $gotD = sprintf $fmtD, $1, $-[1], $2, $-[2] if $s =~ $reD; is($gotA, $expA, 'testA'); is($gotB, $expB, 'testB'); is($gotC, $expC, 'testC'); is($gotAiso, $expA, 'testAiso'); is($gotBiso, $expB, 'testBiso'); is($gotCiso, $expC, 'testCiso'); is($gotD, $expD, 'testD');

    All passed:

    1..7 ok 1 - testA ok 2 - testB ok 3 - testC ok 4 - testAiso ok 5 - testBiso ok 6 - testCiso ok 7 - testD

    — Ken

      Sorry to be so long getting back to you. Events intervene...

      I think we both have strong personal styles and we're each unlikely to persuade the other to change any time soon, so this will likely be my last word in this thread.

      However, I want to take one more opportunity to state my position clearly. The following rationale is, of course, taken largely (if not entirely!) from TheDamian's regex PBPs.

      qr{(?x: ^ ( $re_nonl ) $re_all ( $re_nonl ) $re_all ( $re_nonl ) $ )};

      My understanding of your practice is that you might or might not include an m or an s modifier in the opening  (?x: modifier group depending on whether or not  ^ $ or  . were used in the expression, and on what behavior you wanted these operators to exhibit.

      When I look at the quoted expression, the first thing I ask is "Ok, what do  ^ and  $ do? How do they behave?" Now I have to go modifier hunting. In this case, there is no  /m modifier in sight, so  ^ $ have their default behaviors.

      If I want to change this expression so as to add a  ^ or  $ operator somewhere, I have to repeat the hunt and decide if the operator behavior selected by the existing (or not) m modifier is compatible with the behavior I want. If I'm tempted to add or delete an m modifier, I must look around for other  ^ $ operators already present so that I can be sure the new behavior selected is correct and compatible with pre-existing usages. Room here for bugs to creep in.

      But why should I have to ask these questions? Why should these operators have multiple behaviors? If  \A \Z exactly duplicate the default  ^ $ behaviors (with  \z thrown in to extend this functionality a bit), why not just nail down  ^ $ to their enhanced  /m behaviors? No further thought needed.

      But what if I don't use any  ^ $ in a given regex? Why should I bother with a useless  /m modifier? If m is always present in a standard /xms tail, no harm is done if no  ^ $ is used in the regex, and if one of these operators is ever added to a regex in which it was not present before, the further step of worrying about whether (and where) to add or not to add the corresponding modifier is totally eliminated.

      A similar argument applies to the dot operator: If  [^\n] exactly duplicates the default match behavior of dot, why not set the /s-modified "dot matches all" behavior in cement (especially since the latter behavior is the one most commonly needed in regexes)? Again, the need for thought and the opportunity for confusion are reduced: If no dot appears in a regex, no harm is done; if one must be added later, it's a one-step process.

      The end result is my (near) universal use of the  /xms tail in any  qr// m// s/// that I write. As I've said, there are exceptions due to the exigencies of the moment (usually my own want of ingenuity) or to the intricacies of the application, but they're few and far between.

      (The  '/flags' mode added to re in Perl version 5.14 seems very convenient for enforcing universal use of an  /xms tail with all regex operators, but I've never used it except for a bit of experimentation. I avoid it because it adds yet another versional boundary to worry about transgressing. Especially with postings to PerlMonks, I move back and forth between pre- and post-5.10 Perl versions so often that I have enough of a headache just with these extensions — but they're too enticing to ignore.)


      Give a man a fish:  <%-{-{-{-<

        All good. That's been a long and interesting discussion. Thankyou.

        — Ken