in reply to Capturing string matched by regex

... and you don't even need capturing parentheses:

>perl -wMstrict -le "my $string_to_search = 'This part of all of this paart of the string. Paaaaart!'; ;; my $pattern = qr{ (?i) pa+rt }xms; my @results = $string_to_search =~ /$pattern/g; printf qq{'$_' } for @results; " 'part' 'paart' 'Paaaaart'

Replies are listed 'Best First'.
Re^2: Capturing string matched by regex
by johngg (Canon) on Feb 17, 2012 at 16:13 UTC
    my $pattern = qr{ (?i) pa+rt }xms;

    Leaving aside the merits or demerits of deploying 'x' and 'm' here, I'm just wondering why you have put one regex modifier inside the qr{ ... } and the other three outside. It would seem more consistent to do either

    my $pattern = qr{ pa+rt }xims;

    or

    my $pattern = qr{(?xims) pa+rt };

    Just a little puzzled :-s

    Cheers,

    JohnGG

      ... more consistent ...

      I haven't gone back to review in detail the rationale presented in Perl Best Practices (PBP), but off the top of my head...

      Of course, the reason for   Update: the PBP recommendation of   the unvarying use of the  /xms regex modifier 'tail' (if that's the proper term) is to give the  ^ $ . regex operators unvarying behaviors, and the programmer a few fewer things to worry about; because they're always there, their proper place is in the tail.

      One thing that cannot be made invariant from regex to regex is case insensitivity. Where, then, to put the  /i modifier? If in the modifier tail, it's in danger of being 'lost', and moreover has global effect upon the regex. If in the body of the regex, it's in your face, and has the added advantage of being more flexible: the effects of the  (?i) and  (?-i) extended pattern modifiers are dependent upon the 'scoping' of the regex capturing and non-capturing groups in which they may appear   (Update: see docs linked below for details).

      I.e., the mixture of  qr{pat}xms with  qr{pat}xmsi (or m// or s///) regex definitions is actually less consistent! Moreover, the  (?i) extended pattern allows one to precisely define and control the desired matching behavior.

      Of course, the PBP recommendations are not without controversy. I will only repeat the words of a great Marxist philosopher (Groucho): "These are my principles. If you don't like them, I've got others."

      See Extended Patterns in perlre for detailed info on the behavior of  "(?pimsx-imsx)" and  "(?imsx-imsx:pattern)" patterns, especially on the 'scope' of their effect.

      Updates:

      1. Added link to docs.
      2. Qualified 2nd paragraph text per JohnGG.

        I agree with what you say regarding the flexibility of the (?i) construct over the global m{...}i one but I think I would take issue with your second paragraph.

        ... the unvarying use of the /xms regex modifier 'tail' (if that's the proper term) is to give the ^ $ . regex operators unvarying behaviors ...

        While they are very rarely used that way, m, s and even x are no more invariant than i and can be sprinkled throughout your regular expression. To give a nonsense example:

        knoppix@Microknoppix:~$ perl -E ' > $_ = qq{aabb\nwxy935TXB\n123}; > say $1 if m{(?x) ( a [^a] (?s) .* 9 (?-s) .* ) };' abb wxy935TXB knoppix@Microknoppix:~$
        I'm also under the impression that (?i) need not be confined solely to the scope of capturing and non-capturing groups but can also be used as a "switch" to change the matching behaviour from the point at which it appears onwards or in its own "modifier" group ((?i:pattern)) for want of a better word. The following patterns are examples of how I understand the modifiers can be used:

        m{(?i)Whole pattern case-insensitive} m(Case-Sensitive(?i)case-insensitive(?-i)Case-Sensitive Again} m{Case-Sensitive((?i)except in this capture)Case-Sensitive Again} m{Case-Sensitive(?i:but not here)Case-Sensitive Again} m{all case-insensitive(?-i:Except Here) and insensitive again}i m{(?x) Use white-space\sfor readability(?-x)but literal spaces now};

        PBP is a fascinating book with very well argued recommendations that make you wonder whether you are doing things the right way. I believe that there are two equally valid reactions to each recommendation in the book: follow the recommendation if, after consideration, it seems better than what you were doing before; alternatively, if you can come up with equally cogent arguments for continuing the way you were, then do that. The main thing is that the book has made you think.

        Cheers,

        JohnGG

Re^2: Capturing string matched by regex
by tchrist (Pilgrim) on Feb 18, 2012 at 04:43 UTC
    use v5.14; use re "/sixm"; $_ = "This part is past all of this paaRTy of the string. Paaaaarty on +!"; my @matches = (); sub show_matches { printf "Got %d matches:\n", scalar @matches; my $i = 1; for (@matches) { printf " %2d\t$_\n", $i++, $_; } } @matches = / pa .* rt /g; show_matches(); @matches = / pa .*? rt /g; show_matches(); @matches = / (?= (pa .* rt) ) /g; show_matches(); @matches = (); () = / (pa .* rt) (?{push @matches, $^N}) (*FAIL) /g; show_matches();
    Says:
    Got 1 matches: 1 part is past all of this paaRTy of the string. Paaaaart Got 3 matches: 1 part 2 past all of this paaRT 3 Paaaaart Got 4 matches: 1 part is past all of this paaRTy of the string. Paaaaart 2 past all of this paaRTy of the string. Paaaaart 3 paaRTy of the string. Paaaaart 4 Paaaaart Got 8 matches: 1 part is past all of this paaRTy of the string. Paaaaart 2 part is past all of this paaRT 3 part 4 past all of this paaRTy of the string. Paaaaart 5 past all of this paaRT 6 paaRTy of the string. Paaaaart 7 paaRT 8 Paaaaart