in reply to Issue on covariance calculation

The big advantage of binary format is not that it saves space, but that you can find any element without having to read the whole file. In perl, you can do that with the pack builtin.

Replies are listed 'Best First'.
Re^2: Issue on covariance calculation
by rodion (Chaplain) on Apr 13, 2007 at 09:59 UTC
    I disagree. You can do random access with ASCII so long as you write fixed length records, but it does take about 50% longer to do the conversion. It's the conversion that chews up most of the time, so pack() can be a substantial time saver.

    In the test below, writing floats is 10 times faster than writhing ascii. The file is half the size, so that smaller size only accounts for a factor of 2 speed-up. A factor of 5 comes from not doing the ascii conversion.

    use warnings; use strict; use Time::HiRes qw( gettimeofday tv_interval ); my $size = 1000; my $last_idx = $size-1; my $float_value; my $t0; $t0 = [gettimeofday]; open ASC, '>', "asciinum.txt" or die "can't open ascii file"; for my $row (0..$last_idx) { my @result = (); my $row_len = $row+1; $float_value = $row/10_000 + 1/100_000_000; for my $col (0..$last_idx) { #$float_value = $row/10_000 + $col/100_000_000; push @result, $float_value; } print ASC join(',', @result); } print STDERR "Ascii write took -- ",tv_interval ( $t0 ),"\n"; close ASC; $t0 = [gettimeofday]; open FLOAT, '>', "floatnum.txt" or die "can't open float file"; for my $row (0..$last_idx) { my @result = (); my $row_len = $row+1; $float_value = $row/10_000 + 1/100_000_000; for my $col (0..$last_idx) { #$float_value = $row/10_000 + $col/100_000_000; push @result, $float_value; } print FLOAT pack("F$row_len", @result); } print STDERR "Float write took -- ",tv_interval ( $t0 ),"\n"; close FLOAT; # Gave the following results on my fairly slow machine, with $size = 1 +000 # Ascii write took -- 15.078125 # Float write took -- 2.778736