Hena has asked for the wisdom of the Perl Monks concerning the following question:

Hello Monks,

I'm trying to do some matrix handling as fast as possible. Basicly i have a matrix like this
11	12	13
21	22	23
31	32	33
Now i need to transpose that. Originally i just read the whole matrix into memory as transposed arrays and wrote them out. Now this wont work when transposing to big files. Alternative way was to read one column at a time, but this will take a lot of time (perhaps would be faster to read 10 columns into 10 files, then cat them together).

Alternative way would be to read the whole file and write it out in binary format, allowing fast moving in file ((sys)?seek basicly) to get all numbers in correct order, so i can write the result. But how do i handle this?

I could use open, read, print and seek. Assuming that each column holds a double number (64-bits, right):
# loop for reading one column open (BINM,"<:raw"); read (BINM,$number,64); print OUTM "$number\t"; seek (BINM,64*$columns_number,1);
Or i could use sysread, syswrite, sysread and sysseek in similar manner as above. However, when i'm reading the original matrix file.
while (<MATRIX>) { chomp; my @numbers=split (/\t/,$_); # problem }
How do i write it out as doubles, into the binary file? Also if following values are also acceptable, how do they complicate things (case of characters could be set, but i'd prefer them to allow any case):
[+-]?inf NaN

Replies are listed 'Best First'.
Re: Binary file handling
by hawtin (Prior) on Mar 18, 2004 at 09:35 UTC

    To answer your exact question pack() and unpack() will let you read and write doubles. Look it up in your favorite Perl book, do a search here, then try testing things in the debugger.

    If you are writing binary files you should use binmode() like:

    open(BINM,"<:raw"); binmode(BINM); read(BINM,$number,64);

    It will not have any effect if you don't need it but will save your bacon if you do.

    If your matrix is too big for your system's memory then I would have thought that doing things via files would take way too long. I would suggest that a carefull examination of the problem you are trying to solve is in order.

    If you still want to transpose in files remember that there are many ways to do it, for example if you split the matrix into four quarters, transpose each of them and then combine them in the right order you may find that efficiency can be improved. This problem is the same as reflecting a bitmap image around a diagonal, I'll bet that some of the image processing books have some neat tricks you should look at (the last time I did any low level raster stuff was almost 10 years ago).

      Well the problem with pack comes from using 'nan' and 'inf' there. Now basicly with pure numbers, i could:
      # packing $str=pack ("d*",@numbers); # unpacking to a correct column $val=(unpack("d*",$str))[$col];
      But that wont work if there is 'nan' in the @numbers array, will it?

      I quess i could save memory by use pack in read loop. Thus pushing the limit back to a point. But i have to think that splitting.
Re: Binary file handling
by flyingmoose (Priest) on Mar 18, 2004 at 14:07 UTC
    I'm trying to do some matrix handling as fast as possible. Basicly i have a matrix like this
    Look into the insanely powerful Perl Data Langauge. If this isn't what you want, I'm a llama's uncle.
      This is indeed that and more. However i'm trying to do this, without requiring extra installation, from perl itself. But if i fail, i now know there is a good backup plan :).
Re: Binary file handling
by husker (Chaplain) on Mar 18, 2004 at 14:15 UTC
    Have you looked at Math::Matrix? Don't know how fast it is, but they've done all the "work" for you. Even if it's not suitable for what you want, it may inspire you with some other ideas.
Re: Binary file handling
by tachyon (Chancellor) on Mar 18, 2004 at 14:08 UTC

    The simple combination of File::ReadBackwards, split join and reverse does it off disk, fast and efficiently. The split assumes some sort of space separated data but you could modify this if required. HTH

    C:\>type transpose.pl #!/usr/bin/perl -w use strict; use File::ReadBackwards; transpose( "c:/data.txt", "c:/data-transpose.txt" ); sub transpose { my ( $infile, $outfile ) = @_; tie *BW, 'File::ReadBackwards', $infile or die "Can't read $infile + $!\n"; open OUT, ">$outfile" or die "Can't write $outfile $!\n"; while( <BW> ) { chomp; print OUT join "\t", reverse(split ' '),"\n"; } close BW; close OUT; } C:\>type data.txt 11 12 13 21 22 23 31 32 33 C:\>transpose.pl C:\>type data-transpose.txt 33 32 31 23 22 21 13 12 11 C:\>

    cheers

    tachyon

      Hum. Transposed matrix (or then i'm doing something wrong) in this case should be:
      11	21	31
      12	22	23
      13	23	33
      

        Is this what you want, it flips on the diagonal? The main requirement is that you have as much free disk space for the temp files as the total file size. You will be limited in the number of columns you can transpose by the number of open file descriptors your Perl will let you have. It is very easy to hack the logic to do N colunms per pass at the expense of 1 full read of the input file per extra pass. Alternatively you could DBM or tie a hash to a file and use the keys as pseudo file handles and just append data to the values. Although there is more I/O with a multipass approach is is very vanilla I/O which perl does really fast.

        Update

        See this article for info on how to up the number of available file descriptors (probably 1024/process) on a Linux based system. No idea how it is dealt with on other systems.

        It should be really fast as we make a single pass through the input data and then effectively just write it out (each temp file has one full line in it).

        C:\>type transpose.pl #!/usr/bin/perl -w use strict; transpose90( "c:/data.txt", "c:/data-transpose.txt" ); sub transpose90 { my ( $infile, $outfile, $tmp ) = @_; $tmp ||= 'c:/tmp/temp'; open IN, $infile or die "Can't read $infile $!\n"; # find number of columns and open a temp file for each local $_ = <IN>; chomp; my @data = split ' '; my $num_cols = $#data; my @fhs; for( 0..$num_cols ) { open $fhs[$_], ">$tmp$_.txt" or die "Can't create temp file $t +mp$_ $!\n"; print {$fhs[$_]} $data[$_], "\t"; } while( <IN> ) { chomp; @data = split ' '; print {$fhs[$_]} $data[$_], "\t" for 0..$num_cols; } close IN; open OUT, ">$outfile" or die "Can't write $outfile $!\n"; for ( 0.. $num_cols ) { close $fhs[$_]; # close the temp file open IN, "$tmp$_.txt" or die "Can't read temp file $tmp$_ $!\n +"; print OUT scalar(<IN>), "\n"; close IN; unlink "$tmp$_.txt" } close OUT; } C:\>type data.txt 11 12 13 21 22 23 31 32 33 C:\>transpose.pl C:\>type data-transpose.txt 11 21 31 12 22 32 13 23 33 C:\>

        cheers

        tachyon

        Ah, that is a different problem. That function does a 180 degree tansposition. You did not specify what you were after so I guessed :-) What is the type of the data? Is it int, long, double, string?

        cheers

        tachyon

Re: Binary file handling
by zentara (Cardinal) on Mar 18, 2004 at 16:48 UTC
    You might find a speedy method in PDL. Look at perldoc PDL::Basic. From the pod, as example:
    transpose transpose rows and columns. $b = transpose($a); $b = ~$a; Also bound to the "~" unary operator in PDL::Matrix. perldl> $a = sequence(3,2) perldl> p $a [ [0 1 2] [3 4 5] ] perldl> p transpose( $a ) [ [0 3] [1 4] [2 5] ]

    I'm not really a human, but I play one on earth. flash japh
Re: Binary file handling
by tachyon (Chancellor) on Mar 19, 2004 at 03:42 UTC

    Here is a full suite of file based function. The redundant code could be cut away and it could be made into a module if there is general use for this sort of thing.