Re: processing huge files

Well, if the file is that huge and cannot be handled, maybe using "divide and conquer" works:

use strict;
my $file="filename";

my $file_size = 30720; #grab file size in MB.
my $chunks = 1226; #How many pieces
my $size = int $file_size / $chunks + 1;

my $counter = 0;

for(0..$chunks){
    my $skip = $size * $counter;
    `dd if=$file of=$file.$counter bs=1M count=$size skip=$skip`;
    $counter++;
}
[download]

And will output 1226 files called filename.*. Note that you will need an amount of disk space equal to the file size, and be sure when reading parts to open chunks just before finding the text separator (ie new line character) on each chunk, as it might be distributed bewteen one or more chunks. Also, be sure to close processed files! :D

Comment on Re: processing huge files Download Code

Replies are listed 'Best First'.
Re^2: processing huge files by jhourcle (Prior) on Aug 02, 2005 at 13:35 UTC
If you're going to hand off the work to dd, you might want to use split, as it can act on full lines (so won't break in the middle of a record, given the logic the OP was using.). You also don't need to recursively call it, as the equivalent to your dd example would be: `split -b 30720m -a 3 $INFILE`	[reply] [d/l]
Re^3: processing huge files by fauria (Deacon) on Aug 02, 2005 at 17:32 UTC
I used dd because it can access directly a position in a file using skip, and then sequentialy read its content, without needing to load the whole file and then point to a location.	[reply]