Allasso has asked for the wisdom of the Perl Monks concerning the following question:

Hello, A question I have had for months and have never come across a solution except the kludgy work-around I always use: some text START text I don't want START only text I want END /START.*?END/ matches: START text I don't want START only text I want END I want to match: START only text I want END I there a simple way to do this without injecting newlines before START etc.?

Replies are listed 'Best First'.
Re: really non greedy match
by ikegami (Patriarch) on Apr 24, 2010 at 21:18 UTC
    /START(?:(?!START|END).)*END/
    /START(?:(?!START).)*?END/
    I wonder if the following is faster
    / START (?> [^SE]* (?:S+(?!TART))? (?:E+(?!ND))? )* END /x
      Ah, thanks. I was on the right track, but I was missing the "any character" right after the innermost group and wasn't getting a match at all. I am wondering why that is necessary?

        [ Please don't place your entire post in code tags. Just put a <p> at the start of every paragraph. ]

        Your asking me to explain why your pattern didn't work without showing me your pattern.

Re: really non greedy match
by CountZero (Bishop) on Apr 25, 2010 at 07:17 UTC
    Please, only put <code> ... </code> tags around code and put your plain text in between <p> ... </p> tags.

    CountZero

    A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James

      Around code or other computer text. It's also fine for data, program output, etc
Re: really non greedy match
by jwkrahn (Abbot) on Apr 25, 2010 at 02:35 UTC
    $ perl -le' my $text = "START text I don\047t want START only text I want END"; print scalar reverse reverse( $text ) =~ /(DNE.*?TRATS)/; ' START only text I want END
      That changes the problem case from START...START...END to START...END...END
Re: really non greedy match
by Ratazong (Monsignor) on Apr 24, 2010 at 21:46 UTC

    Have you tried the following?

    /.*START.*?END/

    HTH, Rata

      thanks, but that wouldn't work for me. I want to use the expression as a delimiter in split.
        Bad use of split. The first arg of split is a separator. You want //g.
        my @parts = $text =~ /START((?:(?!START|END).)*)END/sg;
Re: really non greedy match
by Marshall (Canon) on Apr 27, 2010 at 22:46 UTC
    A "greedy" match will indeed get greedy, but it will always allow the last part of the pattern to match if that is possible. Adding a .* at the beginning of the pattern allows that .* to "gobble up" the first START while allowing for the last START to match up with some characters followed by END.
    #!/usr/bin/perl -w use strict; my $text ="some text START text I don't want START only text I want EN +D"; my $wanted = ($text =~ /.*START (.*) END$/)[0]; my $wanted2 = ($text =~ /.*(START .* END)$/)[0]; print "wanted=\"$wanted\"\n"; print "wanted2=\"$wanted2\"\n"; __END__ prints: wanted="only text I want" wanted2="START only text I want END"
    Update: this: my $wanted  = ($text =~ /.*START (.*) END$/)[0]; may look a bit strange, but this is how to assign $1 to $wanted without having to use $1 as an intermediate variable. The text match is in a list context and I just slice to get the contents of the first matching paren. $2 can be done in the same way...
    my ($x,$y) = ($text =~ /.*(START (.*) END)$/)[0,1]; print "x=$x y=$y\n"; #prints: x=START only text I want END y=only text I want
    I like this syntax as it "gets to the point" without $1,$2,$3, etc.
      my $wanted = ($text =~ /.*START (.*) END$/)[0]; my $wanted2 = ($text =~ /.*(START .* END)$/)[0]; my ($x,$y) = ($text =~ /.*(START (.*) END)$/)[0,1];
      can be written as
      my ($wanted) = $text =~ /.*START (.*) END$/; my ($wanted2) = $text =~ /.*(START .* END)$/; my ($x,$y) = $text =~ /.*(START (.*) END)$/;
      thank you Marshall. I didn't see your post until just now.