dash2 has asked for the wisdom of the Perl Monks concerning the following question:

A theoretical question which piqued my interest.

my $str = "tar a rat at ararat";

A match:

$str =~ /at(.*)at/; # captures " at arar"

The same, but non-greedy:

$str =~ /at(.*?)at/; # captures just" "

You might call this "right-non-greedy". Where it matches, it matches, and grabs as little as possible to the right of that.

So what would be the best (shortest, most comprehensible and complete) way to write a "left-non-greedy" match? I.e.

$str =~ /at($magic_match)at/; # matches " arar"

where $magic_match matches at the last place possible, so that it has to grab as little as possible?

Head spinning now... how about a regex that is both right-and-left-non-greedy. I.e. it tries to grab as little as possible, both by matching minimally, and by trying to match as late as possible - and compares all the possibilities to find the shortest?

(I'm looking for generic solutions here - a regex, or at least a way to write regexes, that would work for left- or both- non-greedy matching on any string.)

andramoiennepemousapolutropon

Replies are listed 'Best First'.
•Re: left-non-greedy regex?
by merlyn (Sage) on Mar 25, 2003 at 17:39 UTC
    You're really not going to find that in a single regex, unless you also include code-blocks as part of your solution, in which case you're back into the realm of full Turing-completeness where it can always be done.

    And I don't call it "right non greedy". I think that's where you start thinking that "left non greedy" can exist. I call it "minimal repetitions", which means it doesn't need to really maintain state -- just prefer "match again" over "stop here", or vice versa.

    Your hypothetical regex would need a lot of state as it kept searching. It'd also have to search the entire string every time, in a myriad of ways. Can you say slooooooow?

    Here's one way you can do it with some extra code:

    my $string = "...."; my $regex = qr/...(...).../; # must have $1 be the thing we're optimiz +ing to reduce my @solutions; while ($string =~ g/(?=$regex)/g) { push @solutions, [[@-], [@+]]; # push starts and ends } # now walk through the solutions, and pick the one with the minimal sp +an on $1 # [ code unfinished ]

    -- Randal L. Schwartz, Perl hacker
    Be sure to read my standard disclaimer if this is a reply.

      How about doing the same thing as the existing minimal match - just starting from the other side. Of course you can do that by reversing the search string and than doing the existing minimal match.

      Update: This was the conclusion of the MarkM post later here - so I was not the first with this idea.

        That'll just find the first match, with a minimal span within that match. That's still not going to find the shortest of all possible matches.

        In other words, it won't find the middle "b" in "abbababba", if you're looking for the "shortest run of b's" as in the other node.

        -- Randal L. Schwartz, Perl hacker
        Be sure to read my standard disclaimer if this is a reply.

Re: left-non-greedy regex?
by MarkM (Curate) on Mar 25, 2003 at 23:00 UTC

    You seem to misunderstand how regular expressions work. There is no such thing as right-non-greedy or left-non-greedy. There is only "does this expression match?" In the case that it does not, the regular expression engine advances one character and tries again (recursive, and quite a bit more complicated than that).

    What you describe as "right-non-greedy" is really just "first match, left-to-right, non-greedy." The exact request you are making seems to be "first match, right-to-left, non-greedy", which is really:

    my $str = "tar a rat at ararat"; reverse($str) =~ /ta(.*?)ta/; print scalar(reverse($1)), "\n"; # prints: " arar\n"

    Note that not only must the string be matched backwards, but the regexp itself must be backwards. What you are really asking for is a regular expression engine that begins matching from the last position in the string, and matches the regular expression in reverse order. This sort of thing has been talked about for Perl 6, and I think there is at least one CPAN module to automate reversing a regular expression.

Re: left-non-greedy regex?
by diotalevi (Canon) on Mar 25, 2003 at 17:42 UTC

    [Added See also sexeger]

    Use a greedy match to seek as far right as possible then backtracking will force the engine to go left. This is both suboptimal and if there were multiple match points would select only the right most one. Or is this good enough for the moment?

    /.*at(.*?)at/
Re: left-non-greedy regex?
by dash2 (Hermit) on Mar 25, 2003 at 17:33 UTC
    Some test strings...

    $a[0] = "abbabbbababba"; $a[1] = "ababbabbba"; $a[2] = "abbbabbaba";

    The goal is to match the single b with the same regex in each case.

    andramoiennepemousapolutropon

      This is fairly easy using negative look-ahead and negative look-behind:
      use English; @a = qw/abbabbbababba ababbabbba abbbabbaba/; foreach (@a) { print "$PREMATCH $MATCH $POSTMATCH\n" if /(?<!b)b(?!b)/ } abbabbba b abba a b abbabbba abbbabba b a

      i.e: Only match a b if there is no b before or after it. Although this doesn't really have to do with what you were talking about in the root node, it is one of the easiest ways of doing it.