Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

I've got a long string (ie, 25 mb txt file that's all one line.) I need to seperate it, basically inserting a newline every 150 characters. I know it's probably pretty simple, but I'm still new to Perl. Any takers?

Replies are listed 'Best First'.
Re: Long string needs to be separated
by particle (Vicar) on Feb 13, 2002 at 17:03 UTC
    here's a one liner...

    perl -e "print \"$_\n\" while sysread STDIN,$_,150,0" < your.file.here
    it avoids slurping the file into a variable, so it saves a lot of memory.

    ~Particle

Re: Long string needs to be separated
by I0 (Priest) on Feb 13, 2002 at 17:39 UTC
    {local $/=\150; local $\="\n"; print while <>; }
Re: Long string needs to be separated
by jlongino (Parson) on Feb 13, 2002 at 18:14 UTC
    I did a PM search on sysread and came up with a few nodes, one of which may be relevant to your problem (as VSarkiss pointed out, sysread is probably the best solution): EEK! sysread() is expensive!.

    You might want to use a multiple of 150 that is closest to 4096, or 4050 (27, 150 byte chunks), and you may be able to reduce some of the associated overhead. Although you'll still need to break it up into 150 byte chunks, you could modify one of the regex solutions provided for that purpose.

    A benchmark would be interesting, since I don't know if the gain in speed by increasing the block read might be cancelled out by the additional processing required to break up the 4050 byte string. Unfortunately, I'm at work right now and don't have time to try it myself.

    --Jim

Re: Long string needs to be separated
by VSarkiss (Monsignor) on Feb 13, 2002 at 16:36 UTC

    If you want to read a specific number of bytes without regarding to the actual characters, sysread will fit the bill. Something like this should work:

    open IN, "hugefile.txt" or die "Can't open huge file: $!\n"; open OUT, ">biggerfile.txt" or die "Can't open bigger file: $!\n"; my $line; while (sysread(IN, $line, 150)) { print OUT "$line\n"; } close IN; close OUT;
    (Warning, I haven't tested this.)

    More info on sysread available here and here.

    Update
    Look at I0's and jlongino's answers in this thread. They're better answers than mine!

(The RegEx approach) Re: Long string needs to be separated
by gmax (Abbot) on Feb 13, 2002 at 17:10 UTC
    This is something that I am actually using on a LARGE file and it works fine for me.
    On a 30MB file, it solves the problem in about 7 seconds (linux 2.4 on a PIII 800.
    perl -ne 'while (/(.{0,150})/g) {print "$1\n"}' largefile.txt
    If the remains of the file (after all the insertions) is less than 150 chars, it will be printed anyway. If you have newlines within the line, then it will come up double. But if you are sure that there are not newlines, this one will do.
    _ _ _ _ (_|| | |(_|>< _|
Re: Long string needs to be separated
by simon.proctor (Vicar) on Feb 13, 2002 at 16:45 UTC
    Heres a quick script I knocked up:
    use strict; use warnings 'all'; unless(open(FH,'<test.dat')) { die "Could not open file\n"; } my $temp; while(sysread(FH,$temp,5,0)) { print $temp,"\n"; } close(FH);

    and the data file (save as test.dat)
    1234567890abcdefghijklmnopqrstuvwxyz

    It prints
    12345 67890 abcde fghij klmno pqrst uvwxy z
    HTH
Re: Long string needs to be separated
by rdfield (Priest) on Feb 13, 2002 at 16:38 UTC
    perldoc -f read (pun only partially intended).

    rdfield

Re: Long string needs to be separated
by broquaint (Abbot) on Feb 13, 2002 at 16:43 UTC
    Good thing memory grows on trees these days ...
    use strict; open(FH, "really_big_file") or die("doh - $!"); my $filestr = do { local $/; <FH> }; $filestr =~ s/(.{150})/$1\n/gs;
    This puts the whole file in $filestr, then sticks a \n after every 150 characters. After that you'll probably want to put $filestr somewhere (probably not the original file in case you trample the data :-)
    HTH

    broquaint

    Update: Ok, per trs80's node I should warn you that this is not a great solution as it will undoubtedly be quite slow and take up large amounts of memory (i.e > than the 25MB file size). But if you've got time and hardware on your hands, TMTOWTDI.

      It is not a good idea to slurp 25MB into memory for modifying. It will be faster to use some kind of buffered read vs. a slurp. Even if you have the hardware to handle a file that size in memory.
      I won't argue that this could solve the problem, it just isn't a good habit to get into when dealing with larger files. Ask me how I know :^)