skx has asked for the wisdom of the Perl Monks concerning the following question:

I'm looking for a simple means of breaking long lines of non-whitespace seperated text.

I initially thought of using a class regexp like this:

if ( $line =~ /([^ \t]+{80})/ )

But this didn't do what I want, and I realise that even if it did do what I wanted it'd not help me much.

What I'd like to do is break long lines of continuous text - if you've seen /. then like their comment filter.

Consider the line of input:

xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxbbbbbbbbbbbbbbbb

I'd like the output to be:

xxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxbbbbbbbbbbbbbbbb

(ie. One space inserted in the middle of continuous text which is longer than, say, 80 characters).

substr can easily be used to chop up the text, but it doesn't seem to help me in finding out if a given line should be broken - or where it should be broken if so.

Steve
--

Replies are listed 'Best First'.
Re: Breaking "long" lines of non-space-seperated characters
by ikegami (Patriarch) on Aug 16, 2005 at 15:54 UTC
    You have a syntax error: Remove the +. Also, you can search and replace in one commmand:
    $line =~ s/(\S{80})/$1 /g;

    Of course, that won't work if $line contains HTML, because it might insert the space inside of a tag, and it wouldn't count tags as a space when it should (e.g. <p>).

      With that regexp you'll get an extra space at the end if the line is an even multiple of your breaking width. This handles that case:

      my $width = 8; my $break_at = 4; my $orig = "#" x $width; (my $broken = $orig) =~ s/(\S{$break_at})(?!\z)/$1 /g; print qq{'$orig'\n}, qq{'$broken'\n};

      Result:

      '########' '#### ####'

      Update:

      More ways to do it. With split:

      my $broken = join( q{ }, grep { $_ ne q{} } split( qr/(\S{$broken_at}) +/, $orig ) );

      With substr (ugh):

      my $broken = $orig; for my $chunk ( 1 .. length($broken) / $break_at ) { substr( $broken, $break_at * $chunk + $chunk - 1, 0 , $break_at * $chunk + $chunk - 1 == length($broken) ? q{} : + q{ } ); }

      -xdg

      Code written by xdg and posted on PerlMonks is public domain. It is provided as is with no warranties, express or implied, of any kind. Posted code may not have been tested. Use of posted code is at your own risk.

      Thanks that works, and your point on HTML tags is well noted.

      Steve
      --
Re: Breaking "long" lines of non-space-seperated characters
by inman (Curate) on Aug 16, 2005 at 15:54 UTC
    A simple substitution should do it.
    #! /usr/bin/perl use strict; use warnings; my $data = "ThisIsAVeryLongLine" x 10; $data =~ s/(\S{80})/$1 /g; print "$data\n";