In the chatterbox I proposed something like:

s#(((\e[^a-zA-Z]*[a-zA-Z])*.){1,79})\s#$1\n#g;
which doesn't wrap in the middle of "words" even if they are more than 80 characters long and assumes that all escape sequences start with "\e" (ESC), end with a letter, and contain no other letters.

I recall doing previous similar word-wrapping regexes that also know how to wrap in the middle of really long words, but the trick isn't jumping quickly to mind so I'll try harder later if noone else posts how to do that.

Update: I can't recall the trick and suspect it won't work in this more complex case anyway so I'd probably just go with:

my $esc= qr[\e[^a-zA-Z]*[a-zA-Z]]; my $char= qr[(?:$esc)*.]; s[((?:$char){1,79})\s][$1\n]g; s[((?:$char){79})($char)][$1\n$2]g;
to wrap very long words. Note that I wrap at 79 characters not 80 since some terminal emulations will give blank lines if you wrap exactly at 80.

Be sure to have a trailing "\n" on the end of each line and strip trailing spaces (the chatterbox does this so you may not have to) or else it will wrap when it doesn't need to.

Update2: You said having trailing newlines is a problem so:

my $esc= qr[\e[^a-zA-Z]*[a-zA-Z]]; my $char= qr[(?:$esc)*.]; s[((?:$char){1,79})(\s|$)][$1.($2?$/:"")]ge; s[((?:$char){79})($char)][$1\n$2]g;
or you could even use the previous solution with:
my $esc= qr[\e[^a-zA-Z]*[a-zA-Z]]; my $char= qr[(?:$esc)*.]; $_ .= $/; s[((?:$char){1,79})\s][$1\n]g; s[((?:$char){79})($char)][$1\n$2]g; chomp;
(:

Oh, and I'd do s/\s/ /g (before stripping [[:cntrl:]] characters) in case someone manages to put tabs (or worse) into their chatter (unlikely).

Update3: Three problems with all of the above stuff.

First, the regex can backtrack such that the . matches a "\e" in order to find a place that is long enough to wrap (regexes are *so* greedy).

Second, the regex can decide to start matching right after a "\e" in order to find a slightly longer string in order to wrap it.

Third, escape sequences right at where we should wrap can cause problems. For example 79 non-escape characters followed by an escape sequence then a space should match and replace that space with a newline. But the 79 non-escape characters match our regex for 79 cases of "one non-escape character preceded by zero or more escape sequences" but the escape sequence doesn't match "space".

Changing . to [^\e\n] fixes the first problem. Anchoring the start of the regex fixes the second problem. But how to anchor is a bit complex. Allowing for trailing escape sequences fixes the third.

Which results in this tested code:

my $len= 79; my $esc= '\e'; my $eseq= qr[$esc[^a-zA-Z]*[a-zA-Z]]; my $char= qr[(?:$eseq)*[^$esc\n]]; my $nonsp= qr[(?:$eseq)*[^$esc\s]]; s[(?:^|(?<=\s))((?:$char){1,$len}(?:$eseq)*)\s][$1\n]g; s[(?:^|(?<=\s))((?:$nonsp){$len}(?:$eseq)*)(?=[^$esc\s])][$1\n]g;
So let me know if your testing still finds other problems.

Thanks. That was educational. :)

                - tye

In reply to Re: Wrap while ignoring certain sequences (from CB) by tye
in thread Wrap while ignoring certain sequences by Coruscate

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.