Re: Capturing string matched by regex

Replies are listed 'Best First'.
Re^2: Capturing string matched by regex by johngg (Canon) on Feb 17, 2012 at 16:13 UTC
`my $pattern = qr{ (?i) pa+rt }xms;` Leaving aside the merits or demerits of deploying 'x' and 'm' here, I'm just wondering why you have put one regex modifier inside the `qr{ ... }` and the other three outside. It would seem more consistent to do either `my $pattern = qr{ pa+rt }xims;` [download] or `my $pattern = qr{(?xims) pa+rt };` [download] Just a little puzzled :-s Cheers, JohnGG	[reply] [d/l] [select]
Re^3: Capturing string matched by regex by AnomalousMonk (Archbishop) on Feb 17, 2012 at 20:09 UTC
... more consistent ... I haven't gone back to review in detail the rationale presented in Perl Best Practices (PBP), but off the top of my head... Of course, the reason for Update: the PBP recommendation of the unvarying use of the `/xms` regex modifier 'tail' (if that's the proper term) is to give the `^ $ .` regex operators unvarying behaviors, and the programmer a few fewer things to worry about; because they're always there, their proper place is in the tail. One thing that cannot be made invariant from regex to regex is case insensitivity. Where, then, to put the `/i` modifier? If in the modifier tail, it's in danger of being 'lost', and moreover has global effect upon the regex. If in the body of the regex, it's in your face, and has the added advantage of being more flexible: the effects of the `(?i)` and `(?-i)` extended pattern modifiers are dependent upon the 'scoping' of the regex capturing and non-capturing groups in which they may appear (Update: see docs linked below for details). I.e., the mixture of `qr{pat}xms` with `qr{pat}xmsi` (or `m//` or `s///`) regex definitions is actually less consistent! Moreover, the `(?i)` extended pattern allows one to precisely define and control the desired matching behavior. Of course, the PBP recommendations are not without controversy. I will only repeat the words of a great Marxist philosopher (Groucho): "These are my principles. If you don't like them, I've got others." See Extended Patterns in perlre for detailed info on the behavior of `"(?pimsx-imsx)"` and `"(?imsx-imsx:pattern)"` patterns, especially on the 'scope' of their effect. Updates: Added link to docs. Qualified 2nd paragraph text per JohnGG.	[reply] [d/l] [select]
Re^4: Capturing string matched by regex by johngg (Canon) on Feb 18, 2012 at 00:08 UTC
I agree with what you say regarding the flexibility of the `(?i)` construct over the global `m{...}i` one but I think I would take issue with your second paragraph. ... the unvarying use of the `/xms` regex modifier 'tail' (if that's the proper term) is to give the `^ $ .` regex operators unvarying behaviors ... While they are very rarely used that way, `m`, `s` and even `x` are no more invariant than `i` and can be sprinkled throughout your regular expression. To give a nonsense example: `knoppix@Microknoppix:~$ perl -E ' > $_ = qq{aabb\nwxy935TXB\n123}; > say $1 if m{(?x) ( a [^a] (?s) .* 9 (?-s) .* ) };' abb wxy935TXB knoppix@Microknoppix:~$` [download] I'm also under the impression that `(?i)` need not be confined solely to the scope of capturing and non-capturing groups but can also be used as a "switch" to change the matching behaviour from the point at which it appears onwards or in its own "modifier" group (`(?i:pattern)`) for want of a better word. The following patterns are examples of how I understand the modifiers can be used: `m{(?i)Whole pattern case-insensitive} m(Case-Sensitive(?i)case-insensitive(?-i)Case-Sensitive Again} m{Case-Sensitive((?i)except in this capture)Case-Sensitive Again} m{Case-Sensitive(?i:but not here)Case-Sensitive Again} m{all case-insensitive(?-i:Except Here) and insensitive again}i m{(?x) Use white-space\sfor readability(?-x)but literal spaces now};` [download] PBP is a fascinating book with very well argued recommendations that make you wonder whether you are doing things the right way. I believe that there are two equally valid reactions to each recommendation in the book: follow the recommendation if, after consideration, it seems better than what you were doing before; alternatively, if you can come up with equally cogent arguments for continuing the way you were, then do that. The main thing is that the book has made you think. Cheers, JohnGG	[reply] [d/l] [select]
Re^5: Capturing string matched by regex by AnomalousMonk (Archbishop) on Feb 18, 2012 at 02:41 UTC
Re^2: Capturing string matched by regex by tchrist (Pilgrim) on Feb 18, 2012 at 04:43 UTC
`use v5.14; use re "/sixm"; $_ = "This part is past all of this paaRTy of the string. Paaaaarty on +!"; my @matches = (); sub show_matches { printf "Got %d matches:\n", scalar @matches; my $i = 1; for (@matches) { printf " %2d\t$_\n", $i++, $_; } } @matches = / pa .* rt /g; show_matches(); @matches = / pa .? rt /g; show_matches(); @matches = / (?= (pa . rt) ) /g; show_matches(); @matches = (); () = / (pa .* rt) (?{push @matches, $^N}) (*FAIL) /g; show_matches();` [download] Says: Got 1 matches: 1 part is past all of this paaRTy of the string. Paaaaart Got 3 matches: 1 part 2 past all of this paaRT 3 Paaaaart Got 4 matches: 1 part is past all of this paaRTy of the string. Paaaaart 2 past all of this paaRTy of the string. Paaaaart 3 paaRTy of the string. Paaaaart 4 Paaaaart Got 8 matches: 1 part is past all of this paaRTy of the string. Paaaaart 2 part is past all of this paaRT 3 part 4 past all of this paaRTy of the string. Paaaaart 5 past all of this paaRT 6 paaRTy of the string. Paaaaart 7 paaRT 8 Paaaaart [download]	[reply] [d/l] [select]