in reply to Parsing large files

My best guess is that the spot that it is choking is almost exactly 2 GB through the file, and your version of Perl is not compiled with support for large files. Therefore when it passes the 2 GB point the next seek takes it to position 0 in your file. To see whether your version supports large files, run perl -V and look for -Duselargefiles.

If that is the problem then I know of 2 solutions. One is to compile Perl with support for large files. The other is to change your open line to something like:

if(open(READ, "cat $file |")){
(I'm assuming that you are on an OS with cat installed, and that cat has support for large files.)

If you don't replace Perl, then you'll need a similar trick to write large files, because Perl will again get confused around the 2 GB mark.

Replies are listed 'Best First'.
Re^2: Parsing large files
by Grundle (Scribe) on Apr 10, 2005 at 20:59 UTC
    That is sort of what I suspected, but I am wondering if it may be even more directly related to IO::Handle. I am not sure if IO::Handle has any size limitations or if it just works on the filehandle without having to worry about space etc.

    I like your suggestion about changing the open() statement, but before I settle on that I want to be absolutely sure that is what is going on. This process takes about 22hours to complete, so obviously if I am wrong, it will be a costly (timewise) mistake.

    Thanks!!
      IO::Handle just works on the filehandle.

      If performance is an issue, though, note that there is some overhead (or at least was at one point, it may have improved since I checked) in using IO::Handle's OO support. So it may be faster to use <> directly.

      Additionally you might want to avoid using a threaded Perl (those are slower even if you don't use threads), and on some platforms it can be faster to call read and then split the lines yourself than it is to let it be done with <>. On others the built-in is faster, and I believe that with a current Perl the performance problem behind that should be eliminated everywhere.