Coruscate has asked for the wisdom of the Perl Monks concerning the following question:

I was asking about this yesterday in the chatterbox, but that got almost nowhere, so I shall post here in hopes that I might get some advice on how to accomplish this task.

I am sending output to a telnet client and wish to wrap the text sent on the 80th column, wrapping, of course, only on word boundaries. In my case, wrapping on spaces is good enough, since tab characters will not count. And newlines should be sent as is, not removed. The following code does just this, but there is one complication I have invented to challenge this snippet:

use Text::Wrap; $Text::Wrap::columns = 80; # Modified print() sub output { my $text = shift; my $wrapped = wrap('', '', $text); $wrapped =~ s/\n/\r\n/g; print $wrapped; }

Okay, so that works fine. Text is boundary-wrapped at a max of 80 columns. Perfect. What's the problem? I am now adding special tty escape sequences that clear the screen, change ansi colors, etc etc. These sequences look like \e[2K, \e[37;40m, \e[0m, etc etc. The ones I use all match the regex m#\e\[[\d;]+[mK]#, if that helps whoever may find a way to aid me in my quest :)

So what I need to do is still wrap at 80 columns, while ignoring these special characters in the call to Text::Wrap::wrap(). The problem is that if I pass a string such as \e[33mCoruscate\e[0m;\e[37;40m is having trouble with a script he is working on and therefore makes a trip to \e[32mwww.perlmonks.org\e[0m;\e[37;40m to seek help. to the output() function, it will wrap in the terminal much sooner than I want. It will wrap at a visual 44 columns instead of the wanted 80 (these escape sequences are not physically shown in the terminal. The telnet client takes these out and does special magic behind the scenes). That's because those escape sequences take up 36 columns in my example. Text::Wrap::wrap() doesn't know that it is suppose to exclude these sequences, so it counts them as being there (which you would expect).

My question: Is there a Text::Wrapish type module that allows you to pass along certain strings or patterns which are to be ignored when calculating where to wrap the text? Or perhaps someone smarter more knowledgeable than me in this sort of thing can give me direction or code samples as how to overcome this terrible monster!


If the above content is missing any vital points or you feel that any of the information is misleading, incorrect or irrelevant, please feel free to downvote the post. At the same time, please reply to this node or /msg me to inform me as to what is wrong with the post, so that I may update the node to the best of my ability.

Replies are listed 'Best First'.
Re: Wrap while ignoring certain sequences (from CB)
by tye (Sage) on Mar 12, 2003 at 21:47 UTC

    In the chatterbox I proposed something like:

    s#(((\e[^a-zA-Z]*[a-zA-Z])*.){1,79})\s#$1\n#g;
    which doesn't wrap in the middle of "words" even if they are more than 80 characters long and assumes that all escape sequences start with "\e" (ESC), end with a letter, and contain no other letters.

    I recall doing previous similar word-wrapping regexes that also know how to wrap in the middle of really long words, but the trick isn't jumping quickly to mind so I'll try harder later if noone else posts how to do that.

    Update: I can't recall the trick and suspect it won't work in this more complex case anyway so I'd probably just go with:

    my $esc= qr[\e[^a-zA-Z]*[a-zA-Z]]; my $char= qr[(?:$esc)*.]; s[((?:$char){1,79})\s][$1\n]g; s[((?:$char){79})($char)][$1\n$2]g;
    to wrap very long words. Note that I wrap at 79 characters not 80 since some terminal emulations will give blank lines if you wrap exactly at 80.

    Be sure to have a trailing "\n" on the end of each line and strip trailing spaces (the chatterbox does this so you may not have to) or else it will wrap when it doesn't need to.

    Update2: You said having trailing newlines is a problem so:

    my $esc= qr[\e[^a-zA-Z]*[a-zA-Z]]; my $char= qr[(?:$esc)*.]; s[((?:$char){1,79})(\s|$)][$1.($2?$/:"")]ge; s[((?:$char){79})($char)][$1\n$2]g;
    or you could even use the previous solution with:
    my $esc= qr[\e[^a-zA-Z]*[a-zA-Z]]; my $char= qr[(?:$esc)*.]; $_ .= $/; s[((?:$char){1,79})\s][$1\n]g; s[((?:$char){79})($char)][$1\n$2]g; chomp;
    (:

    Oh, and I'd do s/\s/ /g (before stripping [[:cntrl:]] characters) in case someone manages to put tabs (or worse) into their chatter (unlikely).

    Update3: Three problems with all of the above stuff.

    First, the regex can backtrack such that the . matches a "\e" in order to find a place that is long enough to wrap (regexes are *so* greedy).

    Second, the regex can decide to start matching right after a "\e" in order to find a slightly longer string in order to wrap it.

    Third, escape sequences right at where we should wrap can cause problems. For example 79 non-escape characters followed by an escape sequence then a space should match and replace that space with a newline. But the 79 non-escape characters match our regex for 79 cases of "one non-escape character preceded by zero or more escape sequences" but the escape sequence doesn't match "space".

    Changing . to [^\e\n] fixes the first problem. Anchoring the start of the regex fixes the second problem. But how to anchor is a bit complex. Allowing for trailing escape sequences fixes the third.

    Which results in this tested code:

    my $len= 79; my $esc= '\e'; my $eseq= qr[$esc[^a-zA-Z]*[a-zA-Z]]; my $char= qr[(?:$eseq)*[^$esc\n]]; my $nonsp= qr[(?:$eseq)*[^$esc\s]]; s[(?:^|(?<=\s))((?:$char){1,$len}(?:$eseq)*)\s][$1\n]g; s[(?:^|(?<=\s))((?:$nonsp){$len}(?:$eseq)*)(?=[^$esc\s])][$1\n]g;
    So let me know if your testing still finds other problems.

    Thanks. That was educational. :)

                    - tye

      tye += time() ** time() ** $$;

      That last update works 100% efficiently for all the tests I ran. Thanks a lot!


      If the above content is missing any vital points or you feel that any of the information is misleading, incorrect or irrelevant, please feel free to downvote the post. At the same time, please reply to this node or /msg me to inform me as to what is wrong with the post, so that I may update the node to the best of my ability.

Re: Wrap while ignoring certain sequences
by Enlil (Parson) on Mar 12, 2003 at 22:08 UTC
    How 'bout something like the following (granted it can probably be improved upon:
    use strict; use warnings; use Text::Wrap; $Text::Wrap::columns = 80; output (q|\e[33mCoruscate\e[0m;\e[37;40m is having trouble with a scri +pt he is working on and therefore makes a trip to \e[32mwww.perlmonks +.org\e[0m;\e[37;40m to seek help|); # Modified print() sub output { my $text = shift; my @removed; my $current_position = 0; while ( $text =~ m!(\\e\[[\d;]+[mK];?)!g ) { push @removed, [pos($text) - length($1), $1]; } $text =~ s!\\e\[[\d;]+[mK];?!!g; my $wrapped = wrap('', '', $text); while ( @removed ) { my $current_element = shift @removed; substr($wrapped,@{$current_element}[0]) = @{$current_element}[1] . substr($wrapped,@{$current_element}[0]); } print $wrapped; }
    update:did I mention it could be improved upon. I realized what hv noticed about 20 min after I left for lunch. I admit this code is therefore pretty broken. Thanks hv for the heads up anyhow.

    update2: As for tye's solution above WOW!! I tried to fix my mistakes, but after figuring out how his worked I was humbled (need I say more).

    -enlil

      If I understand this approach correctly, you first locate and remove all escape sequences, then wrap the text, and then restore the escape sequences. But after the text has been wrapped, the string has been modified, so you won't necessarily be putting the escape sequences back in the correct places.

      Hugo