in reply to Re: Re: Re: Re: Re: nested pattern matching
in thread nested pattern matching

This is what I came up with. It loops, but should not be too slow. I hope that it's going to work for you.
#!perl use strict; use warnings; my $string = qq[FGTXYZGTFABCGHABCFGTXYZADXYZGTYABC]; my ($pos, $nummatch) = (0, 0); while ($string =~ /^(.{$pos,}?)[XYZ]{3}([A-Za-z]{0,21}?)[ABC]{3}/) { my $tpos = $pos; $pos = length($1) + 1; $nummatch = length($2); while ($string =~ /^(.{$tpos,}?)[XYZ]{3}([A-Za-z]{$nummatch,21}?)[A +BC]{3}/) { print length($1) . ', ' . length($2) . $/; $nummatch = length($2) + 1; } } __OUTPUT__ 3, 3 3, 8 20, 8 25, 3
Explanation: $pos contains the minimal offset, $nummatch the minimum number of characters to match. The outer loop iterates through the starting positions, the inner one through the different matching lengths. As the matching is non-greedy, you will get the matches ordered by starting position first, second by length.

Replies are listed 'Best First'.
Re: Re(6): nested pattern matching
by vinforget (Beadle) on Aug 18, 2003 at 21:26 UTC
    It works somewhat but I keep getting this repetitive loop when I try it on a larger string. After looking through "mastering regular expressions" by O'Reilly I came up with this:
    use strict; use warnings; my $string = qq[FGTXYZGTFABCGHABCFGTXYZADXYZGTYABC]; #or try this my $string = qq[FGTXYZABCABC]; $string =~ m/[XYZ]{3}([A-Za-z]{0,21}?)[ABC]{3}?(?{print "matched at [< +$&>] $-[0]\n" })(?!)/x;
    This seems to work, but I am sort of confused as to how it works !!
    Vince
      The neat new features. I forgot about them, it's been a while since I read the book. Here is the explanation:
      First thing, the (?{print "matched at [<$&>] $-[0]\n" }) part is evaluated each time the RegEx engine hits it.
      Secondly, The RegEx engine goes from the front to the back, thus matching your search string first. When it is matched, the print part is executed which gives you the data.
      Thirdly, there is a negative look-behind assertation afterwards which contains nothing. An empty RegEx matches in any case, and since it is negative this one matches in no case. So, after the print, the RegEx does not match.
      The RegEx engine just does not give up here. It tries all the other possibilities and prints those for which the first part matches, but it is always thrown back by the match-nothing part afterwards. I personally thinks that Friedl explains this really well, now that you read this, you might remember some of it.
      So, in the end, the RegEx did not match at all while the first part - the important part for you - looped through all possibilities.
      In your case, as you probably know, you would just have to substitute the (?{print "matched at [<$&>] $-[0]\n" }) with (?{print "$-[0], " . length($1). "\n" }) and everything works just fine. The solution is really elegant (I should remember to look into that book again).
      Some concluding notes: The /x tag allows spaces and comments, which is pretty useless here, since we don't have any. And the question mark in the {3}? is useless as well, because there is no greedy/no-greedy behaviour in a fixed-width assertation.
      I hope that the above explained your RegEx a bit.
        Great explanation ! Thanks a bunch.
        Vince