Binary file handling

Hena has asked for the wisdom of the Perl Monks concerning the following question:

Hello Monks,

I'm trying to do some matrix handling as fast as possible. Basicly i have a matrix like this

11	12	13
21	22	23
31	32	33

Now i need to transpose that. Originally i just read the whole matrix into memory as transposed arrays and wrote them out. Now this wont work when transposing to big files. Alternative way was to read one column at a time, but this will take a lot of time (perhaps would be faster to read 10 columns into 10 files, then cat them together).

Alternative way would be to read the whole file and write it out in binary format, allowing fast moving in file ((sys)?seek basicly) to get all numbers in correct order, so i can write the result. But how do i handle this?

I could use open, read, print and seek. Assuming that each column holds a double number (64-bits, right):

# loop for reading one column
open (BINM,"<:raw");
read (BINM,$number,64);
print OUTM "$number\t";
seek (BINM,64*$columns_number,1);
[download]

Or i could use sysread, syswrite, sysread and sysseek in similar manner as above. However, when i'm reading the original matrix file.

while (<MATRIX>) {
  chomp;
  my @numbers=split (/\t/,$_);
  # problem
}
[download]

How do i write it out as doubles, into the binary file? Also if following values are also acceptable, how do they complicate things (case of characters could be set, but i'd prefer them to allow any case):

[+-]?inf
NaN
[download]

Comment on Binary file handling Select or Download Code

Replies are listed 'Best First'.
Re: Binary file handling by hawtin (Prior) on Mar 18, 2004 at 09:35 UTC
To answer your exact question pack() and unpack() will let you read and write doubles. Look it up in your favorite Perl book, do a search here, then try testing things in the debugger. If you are writing binary files you should use binmode() like: `open(BINM,"<:raw"); binmode(BINM); read(BINM,$number,64);` [download] It will not have any effect if you don't need it but will save your bacon if you do. If your matrix is too big for your system's memory then I would have thought that doing things via files would take way too long. I would suggest that a carefull examination of the problem you are trying to solve is in order. If you still want to transpose in files remember that there are many ways to do it, for example if you split the matrix into four quarters, transpose each of them and then combine them in the right order you may find that efficiency can be improved. This problem is the same as reflecting a bitmap image around a diagonal, I'll bet that some of the image processing books have some neat tricks you should look at (the last time I did any low level raster stuff was almost 10 years ago).	[reply] [d/l]
Re:^2 Binary file handling by Hena (Friar) on Mar 18, 2004 at 09:51 UTC
Well the problem with pack comes from using 'nan' and 'inf' there. Now basicly with pure numbers, i could: `# packing $str=pack ("d",@numbers); # unpacking to a correct column $val=(unpack("d",$str))[$col];` [download] But that wont work if there is 'nan' in the @numbers array, will it? I quess i could save memory by use pack in read loop. Thus pushing the limit back to a point. But i have to think that splitting.	[reply] [d/l]
Re: Binary file handling by flyingmoose (Priest) on Mar 18, 2004 at 14:07 UTC
I'm trying to do some matrix handling as fast as possible. Basicly i have a matrix like this Look into the insanely powerful Perl Data Langauge. If this isn't what you want, I'm a llama's uncle.	[reply]
Re: Re: Binary file handling by Hena (Friar) on Mar 19, 2004 at 11:45 UTC
This is indeed that and more. However i'm trying to do this, without requiring extra installation, from perl itself. But if i fail, i now know there is a good backup plan :).	[reply]
Re: Binary file handling by husker (Chaplain) on Mar 18, 2004 at 14:15 UTC
Have you looked at Math::Matrix? Don't know how fast it is, but they've done all the "work" for you. Even if it's not suitable for what you want, it may inspire you with some other ideas.	[reply]
Re: Binary file handling by tachyon (Chancellor) on Mar 18, 2004 at 14:08 UTC
The simple combination of File::ReadBackwards, split join and reverse does it off disk, fast and efficiently. The split assumes some sort of space separated data but you could modify this if required. HTH C:\>type transpose.pl #!/usr/bin/perl -w use strict; use File::ReadBackwards; transpose( "c:/data.txt", "c:/data-transpose.txt" ); sub transpose { my ( $infile, $outfile ) = @_; tie *BW, 'File::ReadBackwards', $infile or die "Can't read $infile + $!\n"; open OUT, ">$outfile" or die "Can't write $outfile $!\n"; while( <BW> ) { chomp; print OUT join "\t", reverse(split ' '),"\n"; } close BW; close OUT; } C:\>type data.txt 11 12 13 21 22 23 31 32 33 C:\>transpose.pl C:\>type data-transpose.txt 33 32 31 23 22 21 13 12 11 C:\> [download] cheers tachyon	[reply] [d/l]
Re:^2 Binary file handling by Hena (Friar) on Mar 18, 2004 at 14:15 UTC
Hum. Transposed matrix (or then i'm doing something wrong) in this case should be: 11 21 31 12 22 23 13 23 33	[reply]
Re: Re:^2 Binary file handling by tachyon (Chancellor) on Mar 18, 2004 at 14:53 UTC
Is this what you want, it flips on the diagonal? The main requirement is that you have as much free disk space for the temp files as the total file size. You will be limited in the number of columns you can transpose by the number of open file descriptors your Perl will let you have. It is very easy to hack the logic to do N colunms per pass at the expense of 1 full read of the input file per extra pass. Alternatively you could DBM or tie a hash to a file and use the keys as pseudo file handles and just append data to the values. Although there is more I/O with a multipass approach is is very vanilla I/O which perl does really fast. Update See this article for info on how to up the number of available file descriptors (probably 1024/process) on a Linux based system. No idea how it is dealt with on other systems. It should be really fast as we make a single pass through the input data and then effectively just write it out (each temp file has one full line in it). C:\>type transpose.pl #!/usr/bin/perl -w use strict; transpose90( "c:/data.txt", "c:/data-transpose.txt" ); sub transpose90 { my ( $infile, $outfile, $tmp ) = @_; $tmp \|\|= 'c:/tmp/temp'; open IN, $infile or die "Can't read $infile $!\n"; # find number of columns and open a temp file for each local $_ = <IN>; chomp; my @data = split ' '; my $num_cols = $#data; my @fhs; for( 0..$num_cols ) { open $fhs[$_], ">$tmp$_.txt" or die "Can't create temp file $t +mp$_ $!\n"; print {$fhs[$_]} $data[$_], "\t"; } while( <IN> ) { chomp; @data = split ' '; print {$fhs[$_]} $data[$_], "\t" for 0..$num_cols; } close IN; open OUT, ">$outfile" or die "Can't write $outfile $!\n"; for ( 0.. $num_cols ) { close $fhs[$_]; # close the temp file open IN, "$tmp$_.txt" or die "Can't read temp file $tmp$_ $!\n +"; print OUT scalar(<IN>), "\n"; close IN; unlink "$tmp$_.txt" } close OUT; } C:\>type data.txt 11 12 13 21 22 23 31 32 33 C:\>transpose.pl C:\>type data-transpose.txt 11 21 31 12 22 32 13 23 33 C:\> [download] cheers tachyon	[reply] [d/l]
Re:^4 Binary file handling by Hena (Friar) on Mar 19, 2004 at 08:59 UTC
Re:^4 Binary file handling by Hena (Friar) on Mar 19, 2004 at 11:40 UTC
Re: Re:^4 Binary file handling by tachyon (Chancellor) on Mar 19, 2004 at 13:04 UTC
Re: Re:^2 Binary file handling by tachyon (Chancellor) on Mar 18, 2004 at 14:19 UTC
Ah, that is a different problem. That function does a 180 degree tansposition. You did not specify what you were after so I guessed :-) What is the type of the data? Is it int, long, double, string? cheers tachyon	[reply]
Re:^4 Binary file handling by Hena (Friar) on Mar 18, 2004 at 14:38 UTC
Re: Binary file handling by zentara (Cardinal) on Mar 18, 2004 at 16:48 UTC
You might find a speedy method in PDL. Look at perldoc PDL::Basic. From the pod, as example: `transpose transpose rows and columns. $b = transpose($a); $b = ~$a; Also bound to the "~" unary operator in PDL::Matrix. perldl> $a = sequence(3,2) perldl> p $a [ [0 1 2] [3 4 5] ] perldl> p transpose( $a ) [ [0 3] [1 4] [2 5] ]` [download] I'm not really a human, but I play one on earth. flash japh	[reply] [d/l]
Re: Binary file handling by tachyon (Chancellor) on Mar 19, 2004 at 03:42 UTC
Here is a full suite of file based function. The redundant code could be cut away and it could be made into a module if there is general use for this sort of thing. Read more... (9 kB)	[reply] [d/l]

Update