in reply to Re: using lookaround assertions to grab info
in thread using lookaround assertions to grab info

see my note above regarding Roy Johnson's suggestion. The key point was that any "label" whose value doesn't fit on one line, the value just continues on a subsequent line with an indented colon.
label one: somevalue label two: a very long value : that does not fit on one line so it continues : on another line label thr: someother value
splitting the scalar on /\n\b/ does the trick. It "slurps" the subsequent lines that don't start with a word into the previous label's value. From then on it is just a matter of removing the newlines and the redundant colons.

That said, I have much to learn from your regexp

$line =~ /^\s*(\w+)?\s*:\s*(.*)$/; next unless $1 || $2;
very neat use of 'next unless'.

Thanks for the help.

Replies are listed 'Best First'.
Re^3: using lookaround assertions to grab info
by dragonchild (Archbishop) on Jun 04, 2004 at 12:48 UTC
    Some thoughts about the /\n\b/ idea. It is very inspired, and I ++'ed it. But, it will fail in the following circumstances:
    1. If you are running on Unix and your email was received on a Mac (or Windows) and copied over using Samba or something similar. (The \n will not match the line ending.)
    2. If the email has a space at the beginning of a line with a key. (Mine handles this correctly, as does BrowserUk's.)
    3. Be absolutely sure you know what \b matches. It is a zero-width assertion that matches the boundary between \w\W or \W\w. \w is (basically) [a-zA-Z0-9_]. So, if one of your labels starts with a quote, it won't match.

    Now, if your situation avoids the above pitfalls, go right ahead. Comment it, though. If we, with large number of combined years of experience, consider it inspired, your maintenance programmer will consider it demonic and worthy of tracking you down with a bloody axe.

    ------
    We are the carpenters and bricklayers of the Information Age.

    Then there are Damian modules.... *sigh* ... that's not about being less-lazy -- that's about being on some really good drugs -- you know, there is no spoon. - flyingmoose

    I shouldn't have to say this, but any code, unless otherwise stated, is untested