Alien has asked for the wisdom of the Perl Monks concerning the following question:

Hi ! I've wrote a file splitter script ... but it gives me a strange error when I use it with files larger than 1GB , here is the error : Negative length at split.pl line 14 (#1) Here is the source code :
use strict; use warnings; my $temp_data=0; my $i=0; my $f=shift || die "File\n"; open(F,$f) || die "OPEN : $!\n"; binmode(F); my $size=shift || die "Size\n"; my $dir=shift || die "Dir that will hold the parts of the file?\n"; my $fsize=-s $f; system("mkdir $dir"); chdir($dir) || die "CHDIR : $!\n"; CHUNK:while(1) { if(($fsize - $size) < 0) { read(F,$temp_data,$fsize); open(G,">$i") || die "CREATE $!\n"; print G $temp_data; print "i've split chunk nr $i\n"; close G; last CHUNK; } else { $fsize-=$size; read(F,$temp_data,$size); open(G,">$i") || die "CREATE $!\n"; print G $temp_data; close G; print "I've split chunk $i\n"; $i++; last CHUNK if($fsize<0); } } close F;
Do you think the program overflows ?

Replies are listed 'Best First'.
Re: File splitting script
by sgifford (Prior) on Jan 25, 2007 at 17:33 UTC
    It's possible for read to return fewer bytes than you actually asked for, especially if reading from a pipe, and almost certainly at the end of the file. This is one possibility; try seeing how many bytes were actually read, and subtracting that from $fsize instead. You should also check for errors in your read, print, and close calls; it's good practice, and it's possible one of those is failing causing the problem.

    It should be straightforward to troubleshoot this by printing out the values of the various counters, and seeing where things go wrong.

    Also, what OS are you on, and where did your Perl come from?

Re: File splitting script
by ambrus (Abbot) on Jan 26, 2007 at 10:56 UTC

    It's probably unrelated to the error, but you should binmode the output file as well if you binmode the input file.

    Also, for simple splitting of a file to chunks, it might be easier to use the split program from coreutils.

Re: File splitting script
by BrowserUk (Patriarch) on Jan 26, 2007 at 11:20 UTC

    I think you found a bug in perl. The length parameter is being treated as an unsigned integer (UV), but is being stored as a signed integer (IV). (Or is it the other way around?). In any case, any attempt to specify a length of greater than 2**31-1 results in the "Negative length" error. (Tested on AS811 and AS817 under XP):

    [0] Perl> open I, '<:raw', '32GB.dat.bz2';; [0] Perl> read( I, $c, 2147483648 );; [Negative length at (eval 4) line 1, <STDIN> line 2. ] Perl> read( I, $c, 2147483647 );;

    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
Re: File splitting script
by ferreira (Chaplain) on Jan 25, 2007 at 17:31 UTC

    I haven't looked deeper, but some comments may help you find the offending code:

    • it looks like you're reading an entire chunk of size $fsize into memory to write it to disk — maybe you should consider read and write with a buffer, something like: turn 1GB file into files of 100MB but reading/writing in 10MB chunks. It may make more sense if your files are larger than your RAM memory.
    • Since you're using read, the write function would be more appropriate as a counterpart than print.
    • Since you're using read, you may take advantage of the return which is the number of bytes read to use it for counting how much was actually read so far.
    • With such a scheme of reading blocks until you fill a part and then going on until you exhaust the original file, you didn't even need to query for the file size.

    And the error you're getting should be related to this statement:

    $fsize-=$size;
    that will produce a negative size eventually if $fsize is not a multiple of $size.

    Update: as pointed by johngg, I messed up things thinking about a pair read/write when there's only sysread/syswrite in Perl. print is just fine as a counterpart of read.

      Since you're using read, the write function would be more appropriate as a counterpart than print.

      I don't think that's right. The write function is for writing formatted records, from the documentation

      Writes a formatted record (possibly multi-line) to the specified FILEHANDLE, using the format associated with that file.

      There is no counterpart to read per se, just use print. Perhaps you were confusing write with syswrite which is the counterpart of sysread.

      Cheers,

      JohnGG

Re: File splitting script
by kyle (Abbot) on Jan 25, 2007 at 17:54 UTC

    I ran it with a file 1142547634 bytes in size, asked for 1024000 byte pieces, and it worked.

      I tried to split a 6GB file in 2 files of 3GB each ... and it dies with that error . The script works fine for lesser amounts of size
        Note that though this (probably) is a bug in perl, unless you've got a 64 bit OS, you can't address more than 4GB anyway, so your approach is limited. Also, if the bug was fixed and you have 4GB of memory, you'll push everything else into swap space, slowing your machine down a lot for no reason at all. You really should read() (edit: and write) in multiple, much smaller chunks.

Re: File splitting script
by ambrus (Abbot) on Jul 24, 2008 at 00:50 UTC
Re: File splitting script
by sgt (Deacon) on Jan 26, 2007 at 15:19 UTC

    not an answer to the question (probably BrowserUk has found the problem) but I wonder why you use system("mkdir...") when perl has mkdir...

    cheers --stephan
      Should we submit this to perlbugs ?

        IMHO this is a serious IO bug that should be submitted via perlbug. use a minimum code snippet reproducing the bug and your version of perl

        thanks --stephan