in reply to Regular Expn Problem

The problem is that with (?:Other\s+(\d+))? all being optional, and preceded by .+?, the regex doesn't need to match the conditional last element as the preceding element happily matches to the end of the string.

One way to ensure that the last element is matched if it exists, is to force the preceding element .+? to be terminated early if it it does.

$text =~ m[ Total\scount\s+(\d+).+? C\s+(\d+).+? G\s+(\d+) .+?(?=Other|$) (?:Other\s+(\d+))? ]sx;

Using the alternation in the lookahead, will ensure that if the "Other" line exists, the final element of the regex will be forced to match it.

You'll still need to check the last capture for undef to decide whether the "other" line was present or not.


Examine what is said, not who speaks.
"Efficiency is intelligent laziness." -David Dunham
"Think for yourself!" - Abigail

Replies are listed 'Best First'.
Re: Re: Regular Expn Problem
by TilRMan (Friar) on May 10, 2004 at 08:25 UTC
    ... the preceding element [.+?] happily matches to the end of the string.

    No, it matches one character. Then (?:Other\s+(\d+))? matches the empty string, and then we reach the end of the pattern.

Re: Re: Regular Expn Problem
by venkatr_n (Sexton) on May 10, 2004 at 06:05 UTC
    I was under the impression that the ? at the end of .+? makes it be "not greedy" and allows the conditional at the end to be matched, but now I'm confused. I can use the exact expression without the conditional at the end, and with .+ instead of .+? and the regexp works correctly. So what is the ? in .+? doing?

      Your right, the ? does make .+? non-greedy, but the (?:...)? say that you don't mind if the contained expression is missing, so as the .+? can match to the end (of any string), then no attempt is made to match the optional expression that follows it.

      Hmm. Maybe this makes more sense? The earlier expression does match to the end of string, and the later (rightmost) expression is optional, so no attempt is made to match the latter.


      Examine what is said, not who speaks.
      "Efficiency is intelligent laziness." -David Dunham
      "Think for yourself!" - Abigail