in reply to nested pattern matching

perl -e "'FGTXYZGTFABCGHABC' =~ /[XYZ]{3}([A-Za-z]{0,21})[ABC]{3}/; print $-[0] . ', ' . length($1);" __OUTPUT__ 3, 8
Note that the output for your first example is 3, 8 instead of 3, 3. The appropiate RegEx for that would have been /[XYZ]{3}([A-Za-z]{0,21}?)[ABC]{3}/ (non-greedy quantifiers).
Hope this helped.

Replies are listed 'Best First'.
Re: Re: nested pattern matching
by vinforget (Beadle) on Aug 18, 2003 at 19:36 UTC
    works great. Thanks. How would I put this into a while loop so I can traverse the whole string? Vince
      This should do the trick:
      #!perl use strict; use warnings; my $string = qq[FGTXYZGTFABCGHABCFGTXYZADXYZGTYABC]; while ($string =~ /[XYZ]{3}([A-Za-z]{0,21}?)[ABC]{3}/g) { print $-[0] . ', ' . length($1) . $/; } __OUTPUT__ 3, 3 20, 8
        it doesn't seem to work for
        FGTXYZABCABC
        it gives 3, 0
        when it should give
        3,0
        3,3
        thanks
        Sorry, posted on wrong position in tree, see node 284703.
        When I read the OP's example, and the followup solution, I did worry about one thing. Though the OP did show the character classes [XYZ] and [ABC] in his example regular expression, his example test string used exactly those same characters, in the same order. My concern is that it is possible that he intends to match exactly XYZ and ABC (and of course whatever falls between up to 21 characters).

        What I'm getting at is that:

        /[XYZ]{3}([A-Za-z]{0,21}?[ABC]{3}/g

        will match both of the following strings:

        "ASDFXYZOtherCharsABCASDF" "ASDFXXXOtherCharsAAAASDF"

        If he intended to specifically match XYZ, and not XXX, then he needed the following regexp instead:

        /XYZ([A-Za-z]{0,21}?ABC/g

        Again, the OP's original regexp confirms that he wanted to match three characters from a class, zero to 21 characters from a class, and then three characters from another class.

        But his choice of example strings showed him looking to match three specific characters, followed by zero to 21 characters from a class, followed by three more specific characters.

        I just wanted to point out that there is a difference, just in case it turns out to be significant in the implementation planned by the original poster.

        Dave

        "If I had my life to do over again, I'd be a plumber." -- Albert Einstein

      got it. Just added the //g at the end. Vince