Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

I'm attempting to split a 2.3GB weblog into separate files by month. Everything runs fine, except it ends around 2GB. How can I get past this? Here is the code I'm using:
while(<>) { if(/\[(\d+)\/(\w+)\/(\d+):(.*)\]/){ $filename = "weblog"."-".$3."-".$2; if(-w $filename){ open(LOGFILE,">>$filename") || die "Sorry, could not append $ +filename: $!"; select(LOGFILE); print $_; } else{ open(LOGFILE,">$filename") || die "Sorry, could not create $f +ilename: $!"; select(LOGFILE); print $_; } } }
Is there a more efficient way to do this, and is there a way to get around the file size issue? This is running on a NT machine. Also, I found a bug report on CPAN regarding a similar issue, but didn't understand the solution. I'd appreciate any help as I am just learning Perl and have received no help yet from the other sites I've posted to. Thanks in advance!

Replies are listed 'Best First'.
Re: Large File Size
by davorg (Chancellor) on Aug 01, 2000 at 17:39 UTC

    Not sure about your 2Gb file size problem, but your general solution to the problem seems a little inefficent. You're opening (or reopening) a file for each line of the log file. It would be more efficient to do something like this (in perl/pseudocode mix!):

    set $prev_date to an impossible value while (<>) { Get $date from line if ($prev_date ne $date) { close FILE; open FILE, "new_filename_calculated_from_$date" or die "$!\n"; } print FILE, $_; $prev_date = $date; } close FILE;
    --
    <http://www.dave.org.uk>

    European Perl Conference - Sept 22/24 2000, ICA, London
    <http://www.yapc.org/Europe/>
Re: Large File Size
by lhoward (Vicar) on Aug 01, 2000 at 17:53 UTC
    I do believe there was some sort of 2GB file size problem w/ solaris and older versions of perl I think. Not really sure. One thing you could do to hack around the problem is use the split unix command to break your 2.3 GB file into smaller chunks and run each of those chunks through your splitter independantly.

    You may be interested in the Apache log splitter/compressor I posted to perlmonks a few months ago. It receives the logs directly from apache, splits them into files by day and compresses them on the fly. I wrote it for Apache, but you should be able to adapt it to many other situations.

RE: Large File Size
by mrmick (Curate) on Aug 01, 2000 at 17:53 UTC
    You may be limited by the resources on your machine. I would suspect that in most cases there would be some type of error message associated with this such as an 'out of memory error'. I have received such messages in the past and have had to either free resources on my system or get more RAM.

    Try the solution presented by davorg. If you receive an error message, then you probably will have a hint at what's happening.

    I hope this helps....

    Mick
Re: Large File Size
by Anonymous Monk on Aug 01, 2000 at 18:12 UTC
    Thanks for the help. I knew the code was a little inefficient, but was satisfied since I'm just starting out and it did the job. Of course it's always good to write more efficient code and the previous date solution did cross my mind at one point so I plan on making the changes. As for insufficient resources, that was the first thing I checked. I monitored the RAM usage and found that it remained pretty constant - only taking up maybe a few kb or MB (Don't remember)RAM, still I had over 100MB RAM free and the drive still has 15GB available, so I don't think it's a resource issue. I was actually going to try splitting the file if I couldn't find a solution. So I'll probably implement the new code, try that, and then if that fails, I'll just figure out a way to split the file since the file is on a NT box.
Re: Large File Size
by Anonymous Monk on Aug 01, 2000 at 19:00 UTC
    Well the program is now 1000x faster and more efficient, but I'm still stopping at the same place. Loos like I'll have to split the file somehow.
      On my SGI (UNIX) machine there's a command csplit that divides a file into an arbitrary number of pieces. I'd expect that to be a pretty common utility across unices.

      On other platforms, all bets are off. :)

      Ben
Final Solution
by Anonymous Monk on Aug 02, 2000 at 21:35 UTC
    Just to update, I did split the file and found a possible problem near the place the program stops. I haven't been able to find the exact cause, but there is an abnormality in the file where the date suddenly changes from May 4 to June 27. Even more perplexing is that the file my program creates ends at May 5. Anyway,I managed to pull the rest of the data from the file, and it looks like I'm just going to consider the end of the log corrupted and innacurate and leave it at that. Thanks for all the help!
Re: Large File Size
by ferrency (Deacon) on Aug 01, 2000 at 23:37 UTC
    What happens if you chop a bit off the beginning of your file? Does it still stop at the same line (indicating maybe there's something in the structure of the file itself that's confusing perl), or after the same number of lines (indicating there may be some inherent 2G limit or something)?

    Just a thought.

    Alan

Re: Large File Size
by nardo (Friar) on Aug 01, 2000 at 18:12 UTC
    You never close the files, so you may be running out of file descriptors (or the NT equivalent) around the 2 gig point.

      Reopening the same Perl file handle closes it first. He shouldn't be running out of file handles.