in reply to Re: Capturing string matched by regex
in thread Capturing string matched by regex

my $pattern = qr{ (?i) pa+rt }xms;

Leaving aside the merits or demerits of deploying 'x' and 'm' here, I'm just wondering why you have put one regex modifier inside the qr{ ... } and the other three outside. It would seem more consistent to do either

my $pattern = qr{ pa+rt }xims;

or

my $pattern = qr{(?xims) pa+rt };

Just a little puzzled :-s

Cheers,

JohnGG

Replies are listed 'Best First'.
Re^3: Capturing string matched by regex
by AnomalousMonk (Archbishop) on Feb 17, 2012 at 20:09 UTC
    ... more consistent ...

    I haven't gone back to review in detail the rationale presented in Perl Best Practices (PBP), but off the top of my head...

    Of course, the reason for   Update: the PBP recommendation of   the unvarying use of the  /xms regex modifier 'tail' (if that's the proper term) is to give the  ^ $ . regex operators unvarying behaviors, and the programmer a few fewer things to worry about; because they're always there, their proper place is in the tail.

    One thing that cannot be made invariant from regex to regex is case insensitivity. Where, then, to put the  /i modifier? If in the modifier tail, it's in danger of being 'lost', and moreover has global effect upon the regex. If in the body of the regex, it's in your face, and has the added advantage of being more flexible: the effects of the  (?i) and  (?-i) extended pattern modifiers are dependent upon the 'scoping' of the regex capturing and non-capturing groups in which they may appear   (Update: see docs linked below for details).

    I.e., the mixture of  qr{pat}xms with  qr{pat}xmsi (or m// or s///) regex definitions is actually less consistent! Moreover, the  (?i) extended pattern allows one to precisely define and control the desired matching behavior.

    Of course, the PBP recommendations are not without controversy. I will only repeat the words of a great Marxist philosopher (Groucho): "These are my principles. If you don't like them, I've got others."

    See Extended Patterns in perlre for detailed info on the behavior of  "(?pimsx-imsx)" and  "(?imsx-imsx:pattern)" patterns, especially on the 'scope' of their effect.

    Updates:

    1. Added link to docs.
    2. Qualified 2nd paragraph text per JohnGG.

      I agree with what you say regarding the flexibility of the (?i) construct over the global m{...}i one but I think I would take issue with your second paragraph.

      ... the unvarying use of the /xms regex modifier 'tail' (if that's the proper term) is to give the ^ $ . regex operators unvarying behaviors ...

      While they are very rarely used that way, m, s and even x are no more invariant than i and can be sprinkled throughout your regular expression. To give a nonsense example:

      knoppix@Microknoppix:~$ perl -E ' > $_ = qq{aabb\nwxy935TXB\n123}; > say $1 if m{(?x) ( a [^a] (?s) .* 9 (?-s) .* ) };' abb wxy935TXB knoppix@Microknoppix:~$
      I'm also under the impression that (?i) need not be confined solely to the scope of capturing and non-capturing groups but can also be used as a "switch" to change the matching behaviour from the point at which it appears onwards or in its own "modifier" group ((?i:pattern)) for want of a better word. The following patterns are examples of how I understand the modifiers can be used:

      m{(?i)Whole pattern case-insensitive} m(Case-Sensitive(?i)case-insensitive(?-i)Case-Sensitive Again} m{Case-Sensitive((?i)except in this capture)Case-Sensitive Again} m{Case-Sensitive(?i:but not here)Case-Sensitive Again} m{all case-insensitive(?-i:Except Here) and insensitive again}i m{(?x) Use white-space\sfor readability(?-x)but literal spaces now};

      PBP is a fascinating book with very well argued recommendations that make you wonder whether you are doing things the right way. I believe that there are two equally valid reactions to each recommendation in the book: follow the recommendation if, after consideration, it seems better than what you were doing before; alternatively, if you can come up with equally cogent arguments for continuing the way you were, then do that. The main thing is that the book has made you think.

      Cheers,

      JohnGG

        ... I think I would take issue with your second paragraph.
        ... the unvarying use of the /xms regex modifier 'tail' ...
        While they are very rarely used that way, m, s and even x are no more invariant than i ...

        What I meant to convey by my reference to "/xms-tail invariance" was that this is the PBP recommendation (original post amended) and that the reason for this is to nail down the behavior of the  ^ $ . critters. For this reason, I regard with horror the idea of sprinkling  (?-x) (?m) (?-s) et al through the regex due to the extreme danger of brain meltdown and subsequent containment breach. For those cases in which one might be tempted to the Dark Side, e.g., the use of  (?-s:.) in case an "anything-but-a-newline" match is needed (always assuming an /xms tail), PBP discusses alternatives; in the foregoing example,  [^\n] (or in 5.12+, the "experimental"  \N sequence).

        I'm also under the impression that (?i) need not be confined solely to the scope of capturing and non-capturing groups ...

        My discussion of the behavior of  (?pimsx-imsx) patterns was brief, vague and lacking. I've tried to remedy this with a link to the docs.

        The following patterns are examples of how I understand the modifiers can be used ...

        I haven't tested these, but they look syntactically correct. However, I would quibble with most, especially the latter ones, on stylistic grounds. I haven't time now, but may return to this point with a detailed discussion of my own preferences.

        PBP is a fascinating book with very well argued recommendations ...

        I agree with every statement in this paragraph.