in reply to A call to keyboards: Better chatterbox wrapping

This might get closer to the requirement assuming that any embedded angle brackets are escaped.

Updated: Added a case to deal with long block of unbroken word chars.

Updated again.

#! perl -slw use strict; use Inline::Files; select OUTPUT; while( <DATA> ) { s[ ( (?:<[^>]+>) | (?:[^<]{9,18}(?=\b\W)) | [^<]{18} ) ][$1 \n]xg; print; } __DATA__ xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx +xxxxxxxxxxxxxxxxxxxxxxx http://news.bbc.co.uk/1/shared/spl/hi/pop_ups/05/business_detroit_moto +r_show/html/1.stm/1.stm this is the <a href="http://news.bbc.co.uk/1/shared/spl/hi/pop_ups/05/ +business_detroit_motor_show/html/1.stm/1.stm">link</a> I was referrin +g to for( 1 .. 20 ){ $bar = $bop[ 1 ]; print "$bar/$baz,$foo[$baz]" } for(1..20){$bar=$bop[1];print"$bar/$baz,$foo[$baz]"} for(1..20)%7B%24bar%3D%24bop%5B1%5D%3Bprint%22%24bar%2F%20%24baz%2C%24 +foo%5B%24baz%5D%22%7D __OUTPUT__ xxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxx xxx http://news.bbc.co .uk/1/shared/spl /hi/pop_ups/05 /business_detroit_ motor_show/html/1 .stm/1.stm this is the <a href="http://news.bbc.co.uk/1/shared/spl/hi/pop_ups/05/business_de +troit_motor_show/html/1.stm/1.stm"> link</a> I was referring to for( 1 .. 20 ){ $bar = $bop[ 1 ]; print "$bar /$baz,$foo[$baz ]" } for(1..20){$bar =$bop[1];print "$bar/$baz,$foo [$baz]"} for(1..20)%7B %24bar%3D%24bop %5B1%5D%3Bprint%22 %24bar%2F%20%24baz %2C%24foo%5B%24baz %5D%22%7D

This version

  • avoids inserting an extra space where the text breaks at a space.
  • tries to keep short quoted strings unbroken
    #! perl -slw use strict; use Inline::Files; select OUTPUT; while( <DATA> ) { s[ ( (?: < [^>]+ > ) | (?: ( ["'] ) (?: (?!\2). ){1,18} \2 ) #"' | (?: [^<"'6]{9,18} (?=\b\W) ) #"' | [^<'"]{18} #"' ) \s? ][$1 \n]xg; print; } __DATA__ a line with "some quoted text" less than 18 chars in length and "some +quoted text more that 18 chars" a line with 'some quoted text' less than 18 chars in length and 'some +quoted text more that 18 chars' xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx +xxxxxxxxxxxxxxxxxxxxxxx http://news.bbc.co.uk/1/shared/spl/hi/pop_ups/05/business_detroit_moto +r_show/html/1.stm/1.stm this is the <a href="http://news.bbc.co.uk/1/shared/spl/hi/pop_ups/05/ +business_detroit_motor_show/html/1.stm/1.stm">link</a> I was referrin +g to for( 1 .. 20 ){ $bar = $bop[ 1 ]; print "$bar/$baz,$foo[$baz]" } for(1..20){$bar=$bop[1];print"$bar/$baz,$foo[$baz]"} for(1..20)%7B%24bar%3D%24bop%5B1%5D%3Bprint%22%24bar%2F%20%24baz%2C%24 +foo%5B%24baz%5D%22%7D __OUTPUT__ a line with "some quoted text" less than 18 chars in length and "some quoted text more that 18 chars " a line with 'some quoted text' less than 18 chars in length and 'some quoted text more that 18 chars ' xxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxx xxx http://news.bbc.co .uk/1/shared/spl /hi/pop_ups/05 /business_detroit_ motor_show/html/1 .stm/1.stm this is the <a href="http://news.bbc.co.uk/1/shared/spl/hi/pop_ups/05/business_det +roit_motor_show/html/1.stm/1.stm"> link</a> I was referring to for( 1 .. 20 ){ $bar = $bop[ 1 ]; print "$bar/$baz,$foo [$baz]" } for(1..20){$bar =$bop[1];print "$bar/$baz,$foo [$baz]"} for(1..20)%7B %24bar%3D%24bop %5B1%5D%3Bprint%22 %24bar%2F%20%24baz %2C%24foo%5B%24baz %5D%22%7D

    Examine what is said, not who speaks.
    Silence betokens consent.
    Love the truth but pardon error.
  • Replies are listed 'Best First'.
    Re^2: A call to keyboards: Better chatterbox wrapping
    by demerphq (Chancellor) on Jan 10, 2005 at 14:17 UTC

      This is pretty much the kind of thing we are looking for. But its mangled the href in the A tag for the BBC. Its essential that the wrapping text doesnt mess with the insides of tags or anything that HTML would normally render. So &amp; cant be wrapped internally. Likewise anything inside of a tag should be left alone. (You can use /<[^>]+>/ for matching tags, we aren't that picky.)

      Note that the content of the chatter has been preprocess before this code executes, so you dont need to worry about fake tags or anthing like that. If something is a valid tag it will match /<[^>]+>/ already. Anything that isnt valid will be modified to not match that pattern.

      ---
      demerphq

        I've updated again to correct that. Any other awkward cases that you know of?


        Examine what is said, not who speaks.
        Silence betokens consent.
        Love the truth but pardon error.

          the only thing that occurs to me is that you seem to have mistaken URL encoded entities with HTML entities. HTML Entities look like: &#91; &#93; &lt; &gt; [ ] < >

          ---
          demerphq