Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl Monk, Perl Meditation
 
PerlMonks  

Re^8: Understanding a portion of perlretut

by choroba (Cardinal)
on Dec 10, 2015 at 10:21 UTC ( [id://1149880]=note: print w/replies, xml ) Need Help??


in reply to Re^7: Understanding a portion of perlretut
in thread Understanding a portion on the Perlretut

how does $dna =~ / (\w\w\w)*? TGA /gx differ logically from $s =~ / (f)*? C /gx
The important difference here is the length of $1.

After the first match (A is where the matching started, B denotes the position of the capture group)

XXXxxxTGAxxTGAxxxxTGAxx ^ ^ | | A B

the matching starts at B + 1. Zero times \w\w\w doesn't match here, we have xxTGAx, so the engine tries longer and longer strings, until it finds the TGA:

XXXxxxTGAxxTGAxxxxTGAxx ^ ^ | | A B

The next search will start at B + 1 again, and fail on xx.

But, with the capture group of length 1, you always match the nearest group, because the (f)*? tries longer and longer strings. Maybe what's confusing here is that expanding the group by one character is similar to the engine advancing the starting position after a match failure?

($q=q:Sq=~/;[c](.)(.)/;chr(-||-|5+lengthSq)`"S|oS2"`map{chr |+ord }map{substrSq`S_+|`|}3E|-|`7**2-3:)=~y+S|`+$1,++print+eval$q,q,a,

Replies are listed 'Best First'.
Re^9: Understanding a portion of perlretut
by Athanasius (Archbishop) on Dec 10, 2015 at 12:46 UTC

    Hello choroba,

    Thanks for the explanation, and I’m sorry to be obtuse but — I still don’t understand. :-( Consider:

    #! perl -l use strict; use warnings; my $s = 'uvXYZdabcXYZfg'; while ($s =~ /(\w\w\w)*?(XYZ)/g) { print 'Found match ', $1, $2, ' at pos: ', pos $s; } print '-----'; while ($s =~ /(abc)*?(XYZ)/g) { print 'Found match ', $1, $2, ' at pos: ', pos $s; }

    Output:

    22:35 >perl 1476_SoPW.pl Found match abcXYZ at pos: 12 ----- Use of uninitialized value $1 in print at 1476_SoPW.pl line 28. Found match XYZ at pos: 5 Found match abcXYZ at pos: 12 22:35 >

    The first capture in each regex is 3 characters wide, but the first regex matches only the second occurrence of XYZ whereas the second regex also matches the first occurrence, with (abc)*? matching zero times. Why the difference in behaviour? In particular, why does (\w\w\w)*? not also match zero times?

    Athanasius <°(((><contra mundum Iustus alius egestas vitae, eros Piratica,

      In the first case, the engine starts from the left and finds a match, repeating (\w\w\w) 3 times:
      uvXYZdabcXYZfg ^ ^ | | A B

      The engine than starts to match at B + 1, and finds no such a match.

      In the second case, the engine starts from the left as well, but finds no match:

      uvXYZdabcXYZfg ^ | A

      So, it moves to A + 1 (still no match), and then A + 2, where it can match with (abc)* repeating zero times:

      uvXYZdabcXYZfg ^ | A=B

      After matching, it continues (because of /g) to B + 3 (no match), and at B + 4 it finally succeeds with

      uvXYZdabcXYZfg ^ ^ | | A B

      Better now?

      ($q=q:Sq=~/;[c](.)(.)/;chr(-||-|5+lengthSq)`"S|oS2"`map{chr |+ord }map{substrSq`S_+|`|}3E|-|`7**2-3:)=~y+S|`+$1,++print+eval$q,q,a,

        Yes, thanks, I’m starting to see some light. :-) But:

        In the first case, the engine starts from the left and finds a match, repeating (\w\w\w) 3 times:

        Makes sense, but then, if the $1 match effectively ends up as (\w\w\w){3}, shouldn’t $1 contain uvXYZdabc? Or, conversely, if the ? quantifier following the * causes it to find the minimum string in uvXYZdabc matching (\w\w\w)*, shouldn’t that be "", the null string? Why does $1 match abc here?

        Athanasius <°(((><contra mundum Iustus alius egestas vitae, eros Piratica,

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1149880]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others sharing their wisdom with the Monastery: (6)
As of 2024-04-25 11:12 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found