Re^3: Understanding a portion of perlretut

Replies are listed 'Best First'.
Re^4: Understanding a portion of perlretut by Corion (Patriarch) on Dec 09, 2015 at 15:55 UTC
Step 2 is to try the leftmost part 1 time (no TGA found). Step 3 is to try the leftmost part 2 times (no TGA found). Step 4 (and this is where the naive part goes bad) is to advance the leftmost starting point by one, since the match is unanchored.	[reply]
Re^5: Understanding a portion of perlretut by BlueStarry (Novice) on Dec 09, 2015 at 16:15 UTC
can you elaborate more please? I cannot understand you. Because following your steps, the short string wouldn't match. Trying the leftmost part 2 times means `ATCGTT = (\w\w\w)*?` ok? 2 times. But how it is possible that it matches on CGTTGA, i cannot understand.	[reply] [d/l]
Re^6: Understanding a portion of perlretut by AnomalousMonk (Archbishop) on Dec 09, 2015 at 22:04 UTC
Here's another way to look at things: instrument the regex with `(?{ code })` (see Extended Patterns) print points to learn by experimentation. I'm also taking the liberty of introducing some other new constructs: the `(?:pattern)` non-capturing grouping (also see Extended Patterns); the `/x` regex modifier (all the preceding links found in perlre); and the `@-` (aka `@LAST_MATCH_START`) array regex special variable (see perlvar). First look at `TGA` matching against a simplified string without a `\G` anchor. Note that in contrast to some other code examples in this thread, the beginning offset of a match is reported. `c:\@Work\Perl>perl -wMstrict -le "my $s = 'XXXxxxTGAxxTGAxxxxxxx'; while ($s =~ m{ (?{ print qq{trying a match at offset }, pos $s }) (? +: \w\w\w)? (TGA) }xmsg) { print qq{matched TGA beginning at offset $-[1]}; } " trying a match at offset 0 matched TGA beginning at offset 6 trying a match at offset 9 trying a match at offset 10 trying a match at offset 11 matched TGA beginning at offset 11` [download] After the successful `TGA` match at offsets 6 thru 8, the regex engine starts trying to match again at offset 9. The RE tries matches at offsets 9, 10 and 11 and finds a spurious* (because it's not on a base-triplet boundary) match at offset 11-13. (I'm not sure why the RE doesn't try matching from offset 14 onward.) Now consider the effect of adding a `\G` anchor assertion. `c:\@Work\Perl>perl -wMstrict -le "my $s = 'XXXxxxTGAxxTGAxxxxxxx'; while ($s =~ m{ \G (?{ print qq{trying a match at offset }, pos $s }) + (?: \w\w\w)? (TGA) }xmsg) { print qq{matched TGA beginning at offset $-[1]}; } " trying a match at offset 0 matched TGA beginning at offset 6 trying a match at offset 9` [download] Now the RE can only begin another successful match at the offset immediately beyond the point at which the previous successful match ended,* offset 9; it cannot try offsets 10 or 11 or any other because they do not satisfy the `\G` assertion. Supplemental: We just got finished saying that in `my $s = 'XXXxxxTGAxxTGAxxxxxxx'; while ($s =~ m{ (?{ print qq{trying a match at offset }, pos $s }) (?: + \w\w\w)? (TGA) }xmsg) { print qq{matched TGA beginning at offset $-[1]}; }` [download] the RE will match the `TGA` at offset 11 because it's not constrained by a `\G` assertion. So in `c:\@Work\Perl>perl -wMstrict -le "my $s = 'XXXxxxTGAxxTGAxxxxTGAxx'; while ($s =~ m{ (?{ print qq{trying a match at offset }, pos $s }) (? +: \w\w\w)? (TGA) }xmsg) { print qq{matched TGA beginning at offset $-[1]}; } " trying a match at offset 0 matched TGA beginning at offset 6 trying a match at offset 9 matched TGA beginning at offset 18` [download] (still no `\G`), why does the RE miss the `TGA` at offset 11 when there is another `TGA` present at offset 18 (which it does match)? Give a man a fish: `<%-{-{-{-<`	[reply] [d/l] [select]
Re^7: Understanding a portion of perlretut by Athanasius (Cardinal) on Dec 10, 2015 at 09:54 UTC
Re^8: Understanding a portion of perlretut by choroba (Cardinal) on Dec 10, 2015 at 10:21 UTC
Some notes below your chosen depth have not been shown here
Re^8: Understanding a portion of perlretut by AnomalousMonk (Archbishop) on Dec 10, 2015 at 22:22 UTC
Some notes below your chosen depth have not been shown here
Re^7: Understanding a portion of perlretut by Discipulus (Canon) on Dec 10, 2015 at 11:12 UTC
Re^7: Understanding a portion of perlretut by Anonymous Monk on Dec 09, 2015 at 22:39 UTC