Solo has asked for the wisdom of the Perl Monks concerning the following question:

I'm parsing multi-line records that looks like so:

recordID: blah fieldname: value foo : bar : baz ...

With the regular expression like so:

while ( /\G\s*([^:]+?)\s*:\s*(.+?)?\s*$/mg ) {...}

My problem is that sometimes value is empty and I'm matching the whole next line, rather than not matching the optional (.+?)?. I haven't used /s, so I thought .+ wouldn't match a newline, but it seems to be. What am I missing?

TIA!

--Solo

--
You said you wanted to be around when I made a mistake; well, this could be it, sweetheart.

Replies are listed 'Best First'.
Re: Regex problem with .+ matching newlines without /s
by ccn (Vicar) on Aug 31, 2004 at 17:40 UTC

    Because \s matches newlines

Re: Regex problem with .+ matching newlines without /s
by ysth (Canon) on Aug 31, 2004 at 18:15 UTC
    What ccn said. If the thing to the right and left of the : need to be on the same line, use [^\S\n] instead of \s. Also, I think (.+?)? is a little on the obfuscated side. I would make from the : onward :[^\S\n]*(.*?)\s*$ (though I think that leaves $2 blank instead of undef.) (The final \s can be left as is unless you want to prevent the regex from swallowing up blank lines.)
      (.+?)? resulted from my (possibly obfuscated) thought process.

      [^\S\n] did the trick, thanks much. I rarely deal with multi-line matches and didn't know that about \s. Seems to me there ought to be an easier way to express non-newline whitespace than [^\S\n].

      --Solo

      --
      You said you wanted to be around when I made a mistake; well, this could be it, sweetheart.
Re: Regex problem with .+ matching newlines without /s
by sleepingsquirrel (Chaplain) on Aug 31, 2004 at 19:45 UTC
    Just for fun...
    $_ = <<END; recordID: blah fieldname: value foo : bar : baz END for (split /\n\s*/) { ($left, $right) = split /\s*:\s*/; #do stuff... printf "%10s:\t\'%s\'\n", $left, $right; }


    -- All code is 100% tested and functional unless otherwise noted.

      Don't forget to add a limit to split in case there are more colons than one in some line.

      Makeshifts last the longest.