in reply to Split string using regex on \n or max line length

Hi Anonymous,

The /s modifier means:

Treat the string as single line. That is, change "." to match any character whatsoever, even a newline, which normally it would not match.

So if I understand your question correctly, simply removing that modifier should do what you want:

my $len = 20; # 345678901234567890 my $text = <<'ENDTXT'; One Two Three Four Five Six Seven Eight Nine Ten Eleven Twelve Thirteen Fourteen ENDTXT my @lines = $text =~ /(.{1,$len}\W)/gm; # remove leftover whitespace at ends of lines s/\s+$// for @lines; print "<$_>\n" for @lines; __END__ <One> <Two Three Four Five> <Six Seven> <Eight Nine Ten> <Eleven Twelve> <Thirteen Fourteen>

Note that there's also the core module Text::Wrap that you could take a look at. Update 2: A an example of Text::Wrap that does the same thing as is similar to the above. Uncomment the tr/// operation to reflow the entire text:

use Text::Wrap; $Text::Wrap::columns = 20; #$text=~tr/\n/ /; print wrap('', '', $text);

Update 1: The following modification to the regex eliminates the need for the s/\s+$// for @lines; above (works because \s includes newline). Update 3: Actually, the following doesn't behave the same way the original regex does. It's hard to make an alternative suggestion without knowing what your intentions are here: do you definitely want to include one more non-word character at the end of the matched string, even whitespace, or did you perhaps mean a word boundary \b? If you could provide some sample input and expected output for different cases, and/or explain more about how you want the splitting to occur, that would help in making an alternate suggestion.

my @lines = $text =~ /(.{1,$len})\s+/gm;

Hope this helps,
-- Hauke D

Replies are listed 'Best First'.
Re^2: Split string using regex on \n or max line length (updated)
by flowdy (Scribe) on Feb 10, 2017 at 09:11 UTC
    Update: The following modification to the regex eliminates the need for the s/\s+$// for @lines; above (works because \s includes newline):

    I still plead for keeping the \W, best with non-greedifier i.e. \W?, so a final punctuation or hyphen is captured even though the limit is exceeded by one.

      Hi flowdy,

      You're right that my second regex didn't operate like the OP's, but if you write /(.{1,$len}\W?)/gm, then that may split in the middle of a word, like your suggestion and unlike the OP's regex. I've updated my node.

      Thanks,
      -- Hauke D

        Hi haukex,

        Yes, you are right. Then again, required \W will not match at the end of the line. So, (?:\W|$) is closer to an ideal solution, which however is rather hypothetical until a proper set of positive and negative test cases is provided.

Re^2: Split string using regex on \n or max line length (updated x3)
by choroba (Cardinal) on Feb 10, 2017 at 15:30 UTC
    > simply removing that modifier

    But the Best Practices tell us to always keep it!

    ($q=q:Sq=~/;[c](.)(.)/;chr(-||-|5+lengthSq)`"S|oS2"`map{chr |+ord }map{substrSq`S_+|`|}3E|-|`7**2-3:)=~y+S|`+$1,++print+eval$q,q,a,
Re^2: Split string using regex on \n or max line length (updated x3)
by Anonymous Monk on Feb 14, 2017 at 05:03 UTC
    Yes, that is very helpful. Removing the /s modifier took care of the problem. Thank you