hiptoss has asked for the wisdom of the Perl Monks concerning the following question:
#!/usr/bin/perl -w
use strict;
use warnings;
my $parts = shift; ### how many parts to split
my @file = @ARGV; ### the files to split
foreach ( @file ) {
### how big should the new file be?
my $size = (-s) / $parts;
### open the input file
open my $in_fh, $_ or warn "Cannot read $_: $!";
binmode $in_fh;
### for all but the last part, read
### the amount of data, then write it to
### the appropriate output file.
for my $part (1 .. $parts - 1) {
### read an output file worth of data
read $in_fh, my $buffer, $size or warn "Read zero bytes from $_: $!";
### write the output file
open my $fh, "> $_$part" or warn "Cannot write to $_$part: $!";
print $fh $buffer;
}
# for the last part, read the rest of
# the file. Buffer will shrink
# to the actual bytes read.
read $in_fh, my $buffer, -s or warn "Read zero bytes from $_: $!";
open my $fh, "> $_$parts" or warn "Cannot write to $_$parts: $!";
print $fh $buffer;
}
EDIT: After implementing SuicideJunkie's suggestion, I am still receiving an out of memory error. Interestingly, however, the files all seem to be created and when I cat them back together, the newly created file has the same md5 as the original:
(root@sw178) zs3 > du -sh lolz.dmg 16G lolz.dmg (root@sw178) zs3 > time ./z3-perl.pl lolz.dmg Out of memory! real 11m10.656s user 0m12.895s sys 0m45.782s (root@sw178) zs3 > ls -l lolz*0-9 -rw-r--r-- 1 root root 1073741824 Nov 9 13:30 lolz.dmg1 -rw-r--r-- 1 root root 1073741824 Nov 9 13:36 lolz.dmg10 -rw-r--r-- 1 root root 1073741824 Nov 9 13:37 lolz.dmg11 -rw-r--r-- 1 root root 1073741824 Nov 9 13:37 lolz.dmg12 -rw-r--r-- 1 root root 1073741824 Nov 9 13:38 lolz.dmg13 -rw-r--r-- 1 root root 1073741824 Nov 9 13:39 lolz.dmg14 -rw-r--r-- 1 root root 1073741824 Nov 9 13:40 lolz.dmg15 -rw-r--r-- 1 root root 434690997 Nov 9 13:40 lolz.dmg16 -rw-r--r-- 1 root root 1073741824 Nov 9 13:31 lolz.dmg2 -rw-r--r-- 1 root root 1073741824 Nov 9 13:31 lolz.dmg3 -rw-r--r-- 1 root root 1073741824 Nov 9 13:32 lolz.dmg4 -rw-r--r-- 1 root root 1073741824 Nov 9 13:32 lolz.dmg5 -rw-r--r-- 1 root root 1073741824 Nov 9 13:33 lolz.dmg6 -rw-r--r-- 1 root root 1073741824 Nov 9 13:34 lolz.dmg7 -rw-r--r-- 1 root root 1073741824 Nov 9 13:35 lolz.dmg8 -rw-r--r-- 1 root root 1073741824 Nov 9 13:35 lolz.dmg9 (root@sw178) zs3 > time for i in `seq 1 16`; do cat lolz.dmg$i >> newlolz.dmg; done real 10m55.629s user 0m4.047s sys 0m42.704s (root@sw178) zs3 > md5sum lolz.dmg newlolz.dmg e9b776914d65da41730265371a84d279 lolz.dmg e9b776914d65da41730265371a84d279 newlolz.dmg
#!/usr/bin/perl -w
use strict;
use warnings;
my $part = 1;
my @file = @ARGV; ### the files to split
my $chunk = 1073741824; #1gb
my ($buffer, $size);
foreach ( @file ) {
#- open the input file
open my $in_fh, $_ or warn "Cannot read $_: $!";
binmode $in_fh;
#- for all but the last part, read
#- the amount of data, then output to file
my $sizeRead = $chunk;
while ($sizeRead == $chunk)
{
#- read an output file worth of data
$sizeRead = read $in_fh, $buffer, $chunk;
die "Error reading: $!\n" unless defined $sizeRead;
#- write the output file
open my $fh, "> $_$part" or warn "Cannot write to $_$part: $!";
print $fh $buffer;
#- increment counter for part#
$part++;
}
#- for the last part, read the rest of
#- the file.
read $in_fh, my $buffer, -s or warn "Read zero bytes from $_: $!";
open my $fh, "> $_$part" or warn "Cannot write to $_$part: $!";
print $fh $buffer;
}
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: Chunking very large files
by SuicideJunkie (Vicar) on Nov 09, 2011 at 19:22 UTC | |
by hiptoss (Novice) on Nov 09, 2011 at 19:36 UTC | |
by SuicideJunkie (Vicar) on Nov 09, 2011 at 20:07 UTC | |
by hiptoss (Novice) on Nov 09, 2011 at 20:13 UTC | |
by SuicideJunkie (Vicar) on Nov 09, 2011 at 20:19 UTC | |
|
Re: Chunking very large files
by Anonymous Monk on Nov 10, 2011 at 05:27 UTC | |
|
Re: Chunking very large files
by Marshall (Canon) on Nov 11, 2011 at 11:43 UTC |