in reply to Matching positions with lookarounds

If you just want to split on "whitespace as close to 76 characters as possible" then you can use:

$finaloutput =~ s/(.{0,76}\s)/$1\r\n/g;

This simply asks for "between zero and 76 characters followed by whitespace". Since it's a greedy match by default it will always grab the most. You may need to change the upper limit to 75 if you want it to be 76 including the space, I don't know if that's the case.

Replies are listed 'Best First'.
Re: Easy Solution
by bart (Canon) on Mar 22, 2004 at 23:45 UTC
    n.b. Please drop the "\r". You should never hardcode a CR into plain text, in Perl. Let the automatic conversion from "\n" to CRLF, when printing to a filehandle without binmode applied, on a platform that wants the CRs, take care of that. "\n" is the logical end-of-line character, on any platform.

    But, that aside, even though you're well on the way, your program has a bug. It will try to add a linebreak in the last line, even if it's narrow enough to fit onto one line. Why would it do that? Because

    $_ = "Hello, world!"; /.{0,76}\s/;
    matches the space between "Hello," and "world!".

    I'd change the regexp to the following:

    s/[^\n\S]*(.{1,76})(?:\s|$)/$1\n/g;
    with the following rationale:
    • It'll match as many characters up to 76, until the end of the string (!) or to the last whitespace character in that substring, whichever is longer
    • /./ doesn't match newlines, thus it'll leave embedded short lines (ending with a "\n") unchanged, and try to match again, directly after the following newline.
    • You likely don't want leading whitespace after a wrapped line — though you probably will want to keep embedded empty lines.
    • You're not interested in autogenerated empty lines, hence the requirement for at least one character. /.{0,76}(\s|$)/g tends to match twice at the end of the string: first with a non-empty string, till the end, and then again with an empty string. BTW IMHO this is a bug — I don't think anybody actually wants this behaviour.
    But, I admit: mine doesn't quite look as easy as yours, any more. :)
Re: Easy Solution
by Sprenger000 (Initiate) on Mar 22, 2004 at 16:48 UTC
    Er, that should be =~, not =, of course. Typo!