demerphq has asked for the wisdom of the Perl Monks concerning the following question:

Hi folks. Something that always annoyed me a touch about PM is the way wrapping works in the CB. It isnt intelligent enough. It sticks spaces in at points where its inconvenient even when a char earlier would have been ideal. This makes it difficult to paste code from the cb into an editor. I started to tweak it, but kinda made a mess. Then I thought I should ask the monks at large to see what they think.

The following is the current code. It takes a string of html in $text and insert spaces into it to prevent there being more than 18 chars of consequentive non-whitespace. It is used often so it need to be fast and it musnt break any html or properly formed html entities in the text.

my $len= 0; $text =~ s{(\s+)|([^\s<&]+)|(<[^<>]*>)|(&#?\w{1,10};)|(.)}{ if( $1 ) { $len= 0; $1; } elsif( length( $2 ) ) { # $2 is the only case that can be "0" (ie. false) my $res= $2; my $tot= $len + length($res); if( 18 < $tot ) { substr( $res, 18-$tot, 0 )= " "; $res =~ s/(\S{18})(?=\S)/$1 /g; # replace previous with following for [tye]s improvment # $res =~ s/(\S{9,18}\b|\S{18})(?=\S)/$1 /g; $res =~ /(\S*)$/; $len= length( $1 ); } else { $len= $tot; } $res; } elsif( $3 ) { $3; } else { my $res= $4 || $5; my $add= $5 ? 1 : int( length($4)/3 ); $len += $add; if( 18 < $len ) { $len= $add; " $res"; } else { $res; } } }egis; return $text;

So, can anybody come up with a better solution? One that will rarely break code that is pasted? Ie that wraps for(1..20){$bar=$bop[1];print"$bar/$baz,$foo[$baz]"} in such a way the code doesnt break? Note it will be 'code'ified before being wrapped, so there may be html entities in the real version of the the previous string.

If we find a better way to do this that doesnt cost any more than we can use it for the site... So lets see what you folks can come up with. :-)

Note:Before posting this it was discussed in CB, tye mentioned the fix commented in the code as being a quick improvement. Its not currently used however.

---
demerphq

Replies are listed 'Best First'.
Re: A call to keyboards: Better chatterbox wrapping
by BrowserUk (Patriarch) on Jan 10, 2005 at 13:23 UTC

    This might get closer to the requirement assuming that any embedded angle brackets are escaped.

    Updated: Added a case to deal with long block of unbroken word chars.

    Updated again.

    #! perl -slw use strict; use Inline::Files; select OUTPUT; while( <DATA> ) { s[ ( (?:<[^>]+>) | (?:[^<]{9,18}(?=\b\W)) | [^<]{18} ) ][$1 \n]xg; print; } __DATA__ xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx +xxxxxxxxxxxxxxxxxxxxxxx http://news.bbc.co.uk/1/shared/spl/hi/pop_ups/05/business_detroit_moto +r_show/html/1.stm/1.stm this is the <a href="http://news.bbc.co.uk/1/shared/spl/hi/pop_ups/05/ +business_detroit_motor_show/html/1.stm/1.stm">link</a> I was referrin +g to for( 1 .. 20 ){ $bar = $bop[ 1 ]; print "$bar/$baz,$foo[$baz]" } for(1..20){$bar=$bop[1];print"$bar/$baz,$foo[$baz]"} for(1..20)%7B%24bar%3D%24bop%5B1%5D%3Bprint%22%24bar%2F%20%24baz%2C%24 +foo%5B%24baz%5D%22%7D __OUTPUT__ xxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxx xxx http://news.bbc.co .uk/1/shared/spl /hi/pop_ups/05 /business_detroit_ motor_show/html/1 .stm/1.stm this is the <a href="http://news.bbc.co.uk/1/shared/spl/hi/pop_ups/05/business_de +troit_motor_show/html/1.stm/1.stm"> link</a> I was referring to for( 1 .. 20 ){ $bar = $bop[ 1 ]; print "$bar /$baz,$foo[$baz ]" } for(1..20){$bar =$bop[1];print "$bar/$baz,$foo [$baz]"} for(1..20)%7B %24bar%3D%24bop %5B1%5D%3Bprint%22 %24bar%2F%20%24baz %2C%24foo%5B%24baz %5D%22%7D

    This version

  • avoids inserting an extra space where the text breaks at a space.
  • tries to keep short quoted strings unbroken
    #! perl -slw use strict; use Inline::Files; select OUTPUT; while( <DATA> ) { s[ ( (?: < [^>]+ > ) | (?: ( ["'] ) (?: (?!\2). ){1,18} \2 ) #"' | (?: [^<"'6]{9,18} (?=\b\W) ) #"' | [^<'"]{18} #"' ) \s? ][$1 \n]xg; print; } __DATA__ a line with "some quoted text" less than 18 chars in length and "some +quoted text more that 18 chars" a line with 'some quoted text' less than 18 chars in length and 'some +quoted text more that 18 chars' xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx +xxxxxxxxxxxxxxxxxxxxxxx http://news.bbc.co.uk/1/shared/spl/hi/pop_ups/05/business_detroit_moto +r_show/html/1.stm/1.stm this is the <a href="http://news.bbc.co.uk/1/shared/spl/hi/pop_ups/05/ +business_detroit_motor_show/html/1.stm/1.stm">link</a> I was referrin +g to for( 1 .. 20 ){ $bar = $bop[ 1 ]; print "$bar/$baz,$foo[$baz]" } for(1..20){$bar=$bop[1];print"$bar/$baz,$foo[$baz]"} for(1..20)%7B%24bar%3D%24bop%5B1%5D%3Bprint%22%24bar%2F%20%24baz%2C%24 +foo%5B%24baz%5D%22%7D __OUTPUT__ a line with "some quoted text" less than 18 chars in length and "some quoted text more that 18 chars " a line with 'some quoted text' less than 18 chars in length and 'some quoted text more that 18 chars ' xxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxx xxx http://news.bbc.co .uk/1/shared/spl /hi/pop_ups/05 /business_detroit_ motor_show/html/1 .stm/1.stm this is the <a href="http://news.bbc.co.uk/1/shared/spl/hi/pop_ups/05/business_det +roit_motor_show/html/1.stm/1.stm"> link</a> I was referring to for( 1 .. 20 ){ $bar = $bop[ 1 ]; print "$bar/$baz,$foo [$baz]" } for(1..20){$bar =$bop[1];print "$bar/$baz,$foo [$baz]"} for(1..20)%7B %24bar%3D%24bop %5B1%5D%3Bprint%22 %24bar%2F%20%24baz %2C%24foo%5B%24baz %5D%22%7D

    Examine what is said, not who speaks.
    Silence betokens consent.
    Love the truth but pardon error.

      This is pretty much the kind of thing we are looking for. But its mangled the href in the A tag for the BBC. Its essential that the wrapping text doesnt mess with the insides of tags or anything that HTML would normally render. So &amp; cant be wrapped internally. Likewise anything inside of a tag should be left alone. (You can use /<[^>]+>/ for matching tags, we aren't that picky.)

      Note that the content of the chatter has been preprocess before this code executes, so you dont need to worry about fake tags or anthing like that. If something is a valid tag it will match /<[^>]+>/ already. Anything that isnt valid will be modified to not match that pattern.

      ---
      demerphq

        I've updated again to correct that. Any other awkward cases that you know of?


        Examine what is said, not who speaks.
        Silence betokens consent.
        Love the truth but pardon error.
Re: A call to keyboards: Better chatterbox wrapping
by BrowserUk (Patriarch) on Jan 10, 2005 at 12:44 UTC

    A few more examples would help. I'm only inserting the newline to make it easy to see where I adding the spaces.

    #! perl -slw use strict; use Inline::Files; select OUTPUT; while( <DATA> ) { s[(.{9,18})(?=\b\W)][$1 \n]g; print; } __DATA__ for(1..20){$bar=$bop[1];print"$bar/$baz,$foo[$baz]"} for(1..20)%7B%24bar%3D%24bop%5B1%5D%3Bprint%22%24bar%2F%20%24baz%2C%24 +foo%5B%24baz%5D%22%7D __OUTPUT__ for(1..20){$bar =$bop[1];print "$bar/$baz,$foo [$baz]"} for(1..20)%7B %24bar%3D%24bop %5B1%5D%3Bprint%22 %24bar%2F%20%24baz %2C%24foo%5B%24baz %5D%22%7D

    Examine what is said, not who speaks.
    Silence betokens consent.
    Love the truth but pardon error.
Re: A call to keyboards: Better chatterbox wrapping
by Juerd (Abbot) on Jan 10, 2005 at 13:41 UTC

    Is forcing wrapping needed at all? You can't even know the font size I use. Inserting spaces, even when at better offsets, will always be a suboptimal and lossy solution.

    You can save yourself the trouble by putting each chatterbox line in a <div> that has, via CSS, overflow set to auto. Then every line that has a word in it that cannot be wrapped by the browser gets its own nice horizontal scrollbar, but only for the part that needs it. There already are <span> tags now (WHY? span+br is a red flag! Oh, and <span class="chat"><span class="chatfrom_221638"> should probably just be <span class="chat chatfrom_221638">), and those can be made <div>s, so it'll actually save some bandwidth ;)

    DIV.chat { overflow: auto; } is all it takes, and lets you get rid of the ugly space insertion hacks of dozens of lines. Off-site example (that will be removed soon) can be found at http://juerd.nl/pmchattertest.html.

    Juerd # { site => 'juerd.nl', plp_site => 'plp.juerd.nl', do_not_use => 'spamtrap' }

      Cool idea, but your CSS on your demo site does not work with Mozilla 1.1 - no scrollbars are shown, so all the text that does not fit into the one line allotted to it just vanishes...

        Cool idea, but your CSS on your demo site does not work with Mozilla 1.1 - no scrollbars are shown, so all the text that does not fit into the one line allotted to it just vanishes...

        It works in Firefox 0.9 and Mozilla 1.7. As an upgrade is available and free, I don't think a browser bug is a good reason for not doing this. And if a workaround is needed, try adding width and/or max-width CSS attributes. If the bug is that it no longer grows vertically, height: auto; might fix it.

        Juerd # { site => 'juerd.nl', plp_site => 'plp.juerd.nl', do_not_use => 'spamtrap' }

        yeah, and doesn't work w/NS 2.3 either. <grins>
        Think Moz 1.1, for all its many good characteristics and even better heirs, is NOTNOT a good test of applying css, as its css support was buggy and severely limited, at best.

      Even simpler: just insert <span></span> into long words. The browser will wrap there, but copy-paste will retrieve the text verbatim. That'll work on every browser in every circumstance.

      Actually, I would advocate inserting &shy; entities (“soft hyphen”), which indicate wrap points and are only rendered when the browser actually has to wrap. Unfortunately, they currently only work as intended in IE, AFAIK.

      In general, I share your view that this is problem is being solved on the wrong level. I'm not sure there's much choice in this particular case, though.

      Makeshifts last the longest.

        Even simpler: just insert into long words. The browser will wrap there, but copy-paste will retrieve the text verbatim. That'll work on every browser in every circumstance.

        Oh, wow. That sounds like a more useful alternative than the space thing. Too bad it still requires the ugly hack. Still, much better than insterting spaces indeed. Wouldn't <b></b> be better, bandwidth-wise? It's an inline level tag, like span.

        It'd be great if all browsers really understood XHTML as XML, because then you could just use <span/> or <b/>.

        Juerd # { site => 'juerd.nl', plp_site => 'plp.juerd.nl', do_not_use => 'spamtrap' }

Re: A call to keyboards: Better chatterbox wrapping (tye)
by tye (Sage) on Jan 10, 2005 at 18:19 UTC

    Tested and running on the test server. Sorry about the strange regex delimiters. Several versions of Perl don't agree on how to escape the embedded delimiters and I blame mod_perl in this case (my copy of Perl agrees with me).

    # Insert spaces to prevent the nodelets from getting too wide. # We leave the loopholes of using a bunch of "&nonentity;"s or # "<!--> -->" to intentionally make the nodelets wide (intended for # /msg'ing to yourself) as the problem is more accidents than abuse +. # "&123" and "&lt" work in some browsers, but we might put spaces i +n # the middle of them (if you don't like it, then remember the ";"). my $len= 0; $text =~ s[(\s+)|([^\s<&]+)|(<[^<>]*>)|(&#?\w{1,10};)|(.)]` if( $1 ) { $len= 0; $1; } elsif( length( $2 ) ) { # $2 is the only case that can be "0" (ie. false) my $res= $2; my $tot= $len + length($res); if( 18 < $tot ) { my $max = 18 - $len; my $min = $max - 9; $min = 0 if $min < 0; $res =~ s[ ( \S{$min,$max} (?: (?<!\W) (?![\w\[{(;,/]) | (?<![\w\$@%&*]) (?!\W) ) | \S{$max} )(?=\S) ][$1 ]x; $res =~ s[ ( \S{9,18} (?: (?<!\W) (?![\w\[{(;,/]) | (?<![\w\$@%&*]) (?!\W) ) | \S{18} )(?=(\S+)) ]{ length( $1 . $2 ) > 18 ? "$1 " : $1 }gex; $res =~ /(\S*)$/; $len= length( $1 ); } else { $len= $tot; } $res; } elsif( $3 ) { $3; } else { my $res= $4 || $5; my $add= $5 ? 1 : int( length($4)/3 ); $len += $add; if( 18 < $len ) { $len= $add; " $res"; } else { $res; } } `egis; return $text;

    It tries to not put spaces in front of any of [{(;,/1 and not after $@%&* because they can be Perl sigils.

    (Updated)

    1 The first 5 because of Perl syntax, the last two because of IE silliness -- the "," is included for two reasons. IE won't wrap on " ," nor on " /".

    Update2: I realized that [ will be encoded as &#91; and will be matched separately so I can remove the \[s from the regexes and probably revert to my best-practice method of using [ ] delimiters for regexes despite the mod_perl(?) bug. This also means that spaces won't be inserted in front of other characters that get encoded, namely any of <>], which is probably not worth trying to work around.

    - tye        

      not after $@%&* because they can be Perl sigils.

      $ that !~ /problem/; $this = ~ /problem/, though;

      Juerd # { site => 'juerd.nl', plp_site => 'plp.juerd.nl', do_not_use => 'spamtrap' }

        I'm not sure I've figured out what you are trying to say.

        I don't particularly care that Perl doesn't mind the space after the sigil; it is still a horrid place to insert a space (humans read code too).

        And between the two characters of =~ is excluded by the \b or its translation into my elaboration of roughly \W\w|\w\W.

        If those weren't your points or you had other points, feel free to reply with the English and code separated so they don't obfuscate each other. (:

        - tye