in reply to Wrap while ignoring certain sequences
In the chatterbox I proposed something like:
which doesn't wrap in the middle of "words" even if they are more than 80 characters long and assumes that all escape sequences start with "\e" (ESC), end with a letter, and contain no other letters.s#(((\e[^a-zA-Z]*[a-zA-Z])*.){1,79})\s#$1\n#g;
I recall doing previous similar word-wrapping regexes that also know how to wrap in the middle of really long words, but the trick isn't jumping quickly to mind so I'll try harder later if noone else posts how to do that.
Update: I can't recall the trick and suspect it won't work in this more complex case anyway so I'd probably just go with:
to wrap very long words. Note that I wrap at 79 characters not 80 since some terminal emulations will give blank lines if you wrap exactly at 80.my $esc= qr[\e[^a-zA-Z]*[a-zA-Z]]; my $char= qr[(?:$esc)*.]; s[((?:$char){1,79})\s][$1\n]g; s[((?:$char){79})($char)][$1\n$2]g;
Be sure to have a trailing "\n" on the end of each line and strip trailing spaces (the chatterbox does this so you may not have to) or else it will wrap when it doesn't need to.
Update2: You said having trailing newlines is a problem so:
or you could even use the previous solution with:my $esc= qr[\e[^a-zA-Z]*[a-zA-Z]]; my $char= qr[(?:$esc)*.]; s[((?:$char){1,79})(\s|$)][$1.($2?$/:"")]ge; s[((?:$char){79})($char)][$1\n$2]g;
(:my $esc= qr[\e[^a-zA-Z]*[a-zA-Z]]; my $char= qr[(?:$esc)*.]; $_ .= $/; s[((?:$char){1,79})\s][$1\n]g; s[((?:$char){79})($char)][$1\n$2]g; chomp;
Oh, and I'd do s/\s/ /g (before stripping [[:cntrl:]] characters) in case someone manages to put tabs (or worse) into their chatter (unlikely).
Update3: Three problems with all of the above stuff.
First, the regex can backtrack such that the . matches a "\e" in order to find a place that is long enough to wrap (regexes are *so* greedy).
Second, the regex can decide to start matching right after a "\e" in order to find a slightly longer string in order to wrap it.
Third, escape sequences right at where we should wrap can cause problems. For example 79 non-escape characters followed by an escape sequence then a space should match and replace that space with a newline. But the 79 non-escape characters match our regex for 79 cases of "one non-escape character preceded by zero or more escape sequences" but the escape sequence doesn't match "space".
Changing . to [^\e\n] fixes the first problem. Anchoring the start of the regex fixes the second problem. But how to anchor is a bit complex. Allowing for trailing escape sequences fixes the third.
Which results in this tested code:
So let me know if your testing still finds other problems.my $len= 79; my $esc= '\e'; my $eseq= qr[$esc[^a-zA-Z]*[a-zA-Z]]; my $char= qr[(?:$eseq)*[^$esc\n]]; my $nonsp= qr[(?:$eseq)*[^$esc\s]]; s[(?:^|(?<=\s))((?:$char){1,$len}(?:$eseq)*)\s][$1\n]g; s[(?:^|(?<=\s))((?:$nonsp){$len}(?:$eseq)*)(?=[^$esc\s])][$1\n]g;
Thanks. That was educational. :)
- tye
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^2: Wrap while ignoring certain sequences (from CB)
by Coruscate (Sexton) on Mar 13, 2003 at 05:27 UTC |