tariqahsan has asked for the wisdom of the Perl Monks concerning the following question:

I have large text file where the lines are all continuous without any carriage return. I need to reformat the file so that each line has a length of exactly 105 characters (including whitespaces).

e.g.

Sample text file:

This is test line number 1.This is test line number 2.This is test line number 3.This is test line number 4.This is test line number 5. This is test line number 6.

Need to format it like below -

This is test line number 1.
This is test line number 2.
This is test line number 3.
This is test line number 4.
This is test line number 5.
This is test line number 6.

Any sort of leads for solving this problem surely will help.

  • Comment on Break continuous lines of a text file based on character lengths

Replies are listed 'Best First'.
Re: Break continuous lines of a text file based on character lengths
by matija (Priest) on Apr 20, 2004 at 15:01 UTC
    Do you mean "insert a newline after every 104 characters", like this:
    while (sysread(INPUT,$buf,104)) { print OUT $buf."\n"; }

    Or do you intend to break the lines in sensible places, as long as no line is longer than 105 characters? For that you would need to scan through the buffer for the sensible place in which to put the newline.

Re: Break continuous lines of a text file based on character lengths
by tinita (Parson) on Apr 20, 2004 at 15:05 UTC
    how do you define a line? if it's as simple as 'a dot marks the end of a line' then you would
    - read in the file at once (1)
    - substitute dots with a dot and a newline
    - print string to newly opened file (2)

    (1) might be more efficient to set $/ to '.' if the file is large, and then read into an array line by line
    (2) use file locking and don't close the file in between, if that's necessary, e.g. if more than one process might open the file

    if you're having problems with one of the steps you're welcome to add specific questions.
    something to read: perlopentut, perlre, flock

Re: Break continuous lines of a text file based on character lengths
by davido (Cardinal) on Apr 20, 2004 at 15:07 UTC
    here's an untested one-liner. Give it a try on a non-critical 'filename.txt' and see if that's what you're after.

    perl -pi.bak -e 'BEGIN{$/=\104;}$_ .= qq/\n/;' filename.txt

    HTH


    Dave

      If you set $\ to "\n", you don't even need the second statement — at least, not for every loop. Using the -l command line switch, you don't even need any code.
      perl -pi.bak -e 'BEGIN{$/=\104;$\=qq/\n/}' filename.txt
      or
      perl -lpi.bak -e 'BEGIN{$/=\104}' filename.txt
      For Windows, you'll have to adapt the quote characters around the code.
        I wouldn't have immediately guessed that the -l switch would work because the POD for perlrun states that if a value is omitted from the -l switch, it sets $\ equal to $/. Since we're changing the value of $/ in the BEGIN{} block, and since the BEGIN{} block executes at least before the -p switch, I figured that using -l would set $\ to \104. Obviously that's not the case and I was just overthinking it.

        Good call bart. Great one-liner!


        Dave

        Thanks Bart! The one liner did work marvel for me.
Re: Break continuous lines of a text file based on character lengths
by gsiems (Deacon) on Apr 20, 2004 at 15:05 UTC
    #!/usr/bin/perl while (read DATA, $buf, 27) { print $buf, "\n"; } __DATA__ This is test line number 1.This is test line number 2.This is test lin +e number 3.This is test line number 4.This is test line number 5.This + is test line number 6.
Re: Break continuous lines of a text file based on character lengths
by jfroebe (Parson) on Apr 20, 2004 at 15:33 UTC

    Hi,

    Even though the file isn't supposed to have new lines in it, I would still check for the newline character and strip it if it is present. Why? You might run across a file (I'm assuming you're going to run this on multiple files or more than just once) that contains a new line character and you might end up with lines of 105 characters, then 5 character line, then 105 characters. It would break your requirement of 105 character lines.

    hope this helps

    Jason L. Froebe

    No one has seen what you have seen, and until that happens, we're all going to think that you're nuts. - Jack O'Neil, Stargate SG-1

A reply falls below the community's threshold of quality. You may see it by logging in.