in reply to Re: Re: nested pattern matching
in thread nested pattern matching

This should do the trick:
#!perl use strict; use warnings; my $string = qq[FGTXYZGTFABCGHABCFGTXYZADXYZGTYABC]; while ($string =~ /[XYZ]{3}([A-Za-z]{0,21}?)[ABC]{3}/g) { print $-[0] . ', ' . length($1) . $/; } __OUTPUT__ 3, 3 20, 8

Replies are listed 'Best First'.
Re: Re: Re: Re: nested pattern matching
by vinforget (Beadle) on Aug 18, 2003 at 20:01 UTC
    it doesn't seem to work for
    FGTXYZABCABC
    it gives 3, 0
    when it should give
    3,0
    3,3
    thanks
      What do you exactly want? The above code produces the starting position and length of a sequence of non-overlapping matches according to the above pattern. It looks like you rather want a minimal and maximal match length for each possible starting position of a match. Is that correct? If no, please clarify the rule according to which the program should behave and I'll try my best to help you.
Re: Re: Re: Re: nested pattern matching
by CombatSquirrel (Hermit) on Aug 18, 2003 at 20:06 UTC
    Sorry, posted on wrong position in tree, see node 284703.
      Sorry if my example was not clear. Yes, I would like all possible overlapping matches in a string to a regular expression, reporting back the starting position. I understand how to get the length fo the match from the expresssion. Thanks.
      Vince
        This is what I came up with. It loops, but should not be too slow. I hope that it's going to work for you.
        #!perl use strict; use warnings; my $string = qq[FGTXYZGTFABCGHABCFGTXYZADXYZGTYABC]; my ($pos, $nummatch) = (0, 0); while ($string =~ /^(.{$pos,}?)[XYZ]{3}([A-Za-z]{0,21}?)[ABC]{3}/) { my $tpos = $pos; $pos = length($1) + 1; $nummatch = length($2); while ($string =~ /^(.{$tpos,}?)[XYZ]{3}([A-Za-z]{$nummatch,21}?)[A +BC]{3}/) { print length($1) . ', ' . length($2) . $/; $nummatch = length($2) + 1; } } __OUTPUT__ 3, 3 3, 8 20, 8 25, 3
        Explanation: $pos contains the minimal offset, $nummatch the minimum number of characters to match. The outer loop iterates through the starting positions, the inner one through the different matching lengths. As the matching is non-greedy, you will get the matches ordered by starting position first, second by length.
Re: Re: Re: Re: nested pattern matching
by davido (Cardinal) on Aug 19, 2003 at 06:49 UTC
    When I read the OP's example, and the followup solution, I did worry about one thing. Though the OP did show the character classes [XYZ] and [ABC] in his example regular expression, his example test string used exactly those same characters, in the same order. My concern is that it is possible that he intends to match exactly XYZ and ABC (and of course whatever falls between up to 21 characters).

    What I'm getting at is that:

    /[XYZ]{3}([A-Za-z]{0,21}?[ABC]{3}/g

    will match both of the following strings:

    "ASDFXYZOtherCharsABCASDF" "ASDFXXXOtherCharsAAAASDF"

    If he intended to specifically match XYZ, and not XXX, then he needed the following regexp instead:

    /XYZ([A-Za-z]{0,21}?ABC/g

    Again, the OP's original regexp confirms that he wanted to match three characters from a class, zero to 21 characters from a class, and then three characters from another class.

    But his choice of example strings showed him looking to match three specific characters, followed by zero to 21 characters from a class, followed by three more specific characters.

    I just wanted to point out that there is a difference, just in case it turns out to be significant in the implementation planned by the original poster.

    Dave

    "If I had my life to do over again, I'd be a plumber." -- Albert Einstein