JykkeDaMan has asked for the wisdom of the Perl Monks concerning the following question:

I need to find the latest occurence of 'start: some text here stop:' from the unknown length string.

The string is like this:

'something start: some text stop: other text here start: again something stop: may be some text here start: and finally stop: and maybe something here'

I would like to get the last start:<...>stop: pair and group the text between them ('and finally' in this example). Strings 'start:' and 'stop:' will be unique so that there is always a pair or them and 'some text' between them doesn't include those strings.

Any hints?

Replies are listed 'Best First'.
Re: Regex latest match?
by davido (Cardinal) on Dec 30, 2005 at 09:23 UTC

    For simple cases, you could use this regexp:

    while( $string =~ m/\bstart:(.+?)\bstop:/g ) { print "$1\n"; }

    ...and to get the last match using that regexp, apply it like this:

    my $lastmatch = ( $string =~ m/\bstart:(.+?)\bstop:/g )[ -1 ];

    <update>
    ...or for a pure regexp approach:

    if( $string =~ m/\bstart:(?!.+?(?:\bstart:))(.+?)\bstop:/ ) { print "$1\n"; }

    ...which only further punctuates the point I make in the next paragraph.
    </update>

    But that's not going to be very robust. What if your needs mature to the point that you can no longer guarantee that 'stop:' doesn't occur embedded within the portion of the string you're capturing? For example, if 'stop:' is wrapped in quotes, should it be treated as a delimiter, or as text? For a solution that will stand up to these sorts of complex strings, forget about hand crafting a masterful regular expression. The hard work has already been done, refined, debugged, tested, and proven. To take advantage of the work that's already been done, have a look at Text::Balanced.


    Dave

Re: Regex latest match?
by TedPride (Priest) on Dec 30, 2005 at 09:20 UTC
    This seems to work:
    use strict; use warnings; my $start = 'start:'; my $end = 'stop:'; $_ = join '', <DATA>; @_ = m/\Q$start\E(.*?)\Q$end\E/gs; print $_[-1]; __DATA__ something start: some text stop: other text here start: again somethin +g stop: may be some text here start: and finally stop: and maybe some +thing here
    This might not be terribly efficient however, if there happen to be a lot of matches. It might be better to start with a rindex:
    pos = rindex($_, $start); m/\Q$start\E(.*?)\Q$end\E/gs; print $1;
Re: Regex latest match?
by ysth (Canon) on Dec 30, 2005 at 10:08 UTC
    my $string = 'something start: some text stop: other text here start: +again something stop: may be some text here start: and finally stop: +and maybe something here'; my ($lastgroup) = $string =~ /.*start: (.*?) stop:/s;
Re: Regex latest match?
by vennirajan (Friar) on Dec 30, 2005 at 09:29 UTC
    Hi JykkeDaMan,

         You can use the following piece of code.

    #!/usr/bin/perl -w use strict; my $filename = 'regex_data.txt'; open INPUTFILE , $filename or die "Cannot Open : $filename : $!"; while ( <INPUTFILE> ) { chomp $_; print "$2\n" if ( $_ =~ /^.*(start\:)(.*)(stop\:)/gi ) } close INPUTFILE;


          Assume that the file "regex_data.txt" contains your data. This code will print only the data between the last "start:" - "stop:" pair.

    Hope this will help you !



    Regards,
    S.Venni Rajan.
    "A Flair For Excellence."
                    -- BK Systems.