If you're after a "best" implementation ... tie::file
There's a couple of problems with that.
- Performance:
Writing a file with Tie::File, even with memory allocated to easily accommodate the whole file, is orders of magnitude slower than direct writing.
c:\test>junk7 -N=10e3 ### 1/2 MB
Took 0.098 seconds
Took 34.255 seconds
c:\test>junk7 -N=20e3 ### 1 MB
Took 0.197 seconds
Took 137.506 seconds
c:\test>junk7 -N=1e6 ### 50 MB
Took 9.449 seconds
^C
By the time you get to 50 MB I estimate it will take hours instead of 10 seconds.
- There doesn't seem to be any simple way to binmode a Tie::File tied file. Which means that on some systems, the data in the file will be different to that checksummed:
21/07/2010 01:26 510,033 junk.dat
21/07/2010 01:26 520,034 junk2.dat
Test code:
#! perl -slw
use strict;
use Time::HiRes qw[ time ];
use Tie::File;
use Digest::MD5 qw[ md5_hex ];
our $N //= 1e6;
my $start = time;
open OUT, '+>:raw', 'junk.dat';
print OUT md5_hex( 0 );
my $data = 'x' x 50;
my $md5 = new Digest::MD5;
for ( 1.. $N ) {
print OUT $data;
$md5->add( "$data\n" );
}
seek OUT, 0, 0;
print OUT $md5->hexdigest;
close OUT;
printf "Took %.3f seconds\n", time-$start;
$start = time;
tie my @lines, 'Tie::File', 'junk2.dat', memory => 52 * $N;
$md5 = new Digest::MD5;
push @lines, md5_hex( 0 );
for ( 1.. $N ) {
push @lines, $data;
$md5->add( "$data\n" );
}
$lines[ 0 ] = $md5->hexdigest;
untie @lines;
printf "Took %.3f seconds\n", time-$start;
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.