Evanovich has asked for the wisdom of the Perl Monks concerning the following question:

Hi guys-- What's the fastest way to load a stored matrix of numbers from a file into a two-dim array? My array is 6000 x 1000. Thanks! Evan

Replies are listed 'Best First'.
(cLive ;-) Re: Fast Matrix Load
by cLive ;-) (Prior) on Feb 17, 2002 at 03:14 UTC
    Well, you don't tell us anything about the file, but if it's comma delimited, then perhaps this:
    #!/usr/bin/perl -w use strict; use warnings; # create array ref and count my $matrix=(); my $i=0; open PROFILES, 'profile/profile.db' or die $!; while (<PROFILES>) { chomp; # add array as array ref to array ref @{$matrix->[$i]} = split /\t/; $i++; } close(PROFILES); # access referenced data like this print $matrix->[1]->[6]; # (7th element of 2nd row generally :) # should start at 0,0 exit(0);

    cLive ;-)

    Update: since you've added more info, I've updated above to reflect your existing code..

    --
    seek(JOB,$$LA,0);

      Well, that might help, except that there's one caveat. I'm passing this matrix to a C routine. Here's how it works, once I build the matrix @matrix as shown earlier:
      anglecast (@matrix); sub anglecast (\@) { my $avMatrix = shift (@_); my $packedMatrix= ""; my @recycleBin; my $width= @{$avMatrix->[0]}; for my $avRow(@$avMatrix) { $width= @$avRow if @$avRow<$width; my $packedRow= pack("f$width", @$avRow); push @recycleBin, \$packedRow; $packedMatrix .= pack("P", $packedRow); } return CAST(0+@$avMatrix, $width, $packedMatrix); } Anyone know how to modify this if I want to use references? Or maybe +there's a more efficient way to load a matrix? Thanks again, Evan
        The quickest solution I can think of (without trying to understand your code), is to dereference the array as you send it to the sub, ie:
        anglecast (@{$matrix});

        cLive ;-)

        --
        seek(JOB,$$LA,0);

Re: Fast Matrix Load
by dvergin (Monsignor) on Feb 17, 2002 at 03:01 UTC
    Tsk, tsk, Evanovich. As a good acolyte you know that you will get a much more generous response if you show us a little code from your own attempt to address this problem.

    Failing that, you might share some of your thinking about why the most obvious solutions might not be the optimum you are looking for...

    Also, some information about this file would be helpful. Do you have control over its format? If not, what is the form of the data stored there? It's tough to offer a solution when we don't know what the data looks like or how much freedom there may be in determining its form.

    Update: /me smiles encouragingly at Evanovich noting that his solution uses the same basic approach as that of the honorable cLive ;-), differing mostly in the use of intermediate temporary values and a couple inadvertant errors.
     

      Yes yes yes. Okay. I'll give you my code; I'm just embarassed that I can't get a faster time for this very simple operation. Here is what I have:
      my @matrix; my @profile; open PROFILES, "profile/profile.db" or die "$!"; while (<PROFILES>) { @profile = split (/\t/, $_); while ($i <= $#profile) { $matrix[$j]->[$i] = $profile[$i]; $j++; } $i++; $j = 0; }
      Okay dvergin? I know it sucks--I'm having to load the data twice essentially, passing it once into a temporary array. Suggestions would be very very appreciated. Apologies for not giving you my code. Sorry sorry sorry. Your good acolyte, Evan

      Moved end-code tag 2002-02-16 dvergin

        #this is going to take a very long time while ($i <= $#profile) { $matrix[$j]->[$i] = $profile[$i]; $j++; }
Re: Fast Matrix Load
by zengargoyle (Deacon) on Feb 17, 2002 at 06:43 UTC

    You might want to look at PDL, it's fast bulky matrix like math for perl. Keeps it's data in C format so you wouldn't have to pack/unpack. It has routines for reading/writing several common file formats.

    I don't remember exact syntax but there's a whitespace delimited reader, and a method to get the C pointer.

Re: Fast Matrix Load
by tachyon (Chancellor) on Feb 17, 2002 at 12:44 UTC

    I'm not sure if this is fastest but it is the shortest and sweetest way. It uses perl style code rather than C style.

    Update

    This is 150% faster than your C style method, see this

    use Data::Dumper; while(<DATA>){ push @matrix, [split/\s+/]; } print Dumper(\@matrix); __DATA__ 1 2 3 4 5 6 7 8 9 10 11 12

    cheers

    tachyon

    s&&rsenoyhcatreve&&&s&n.+t&"$'$`$\"$\&"&ee&&y&srve&&d&&print

      shortest??? are you sure? ;-)
      @matrix=map[split],<DATA>;

      -Blake

        OK not shortest :-/ but fastest :-), vis:

        print "Writing file...\n"; open FILE, ">test.txt" or die $!; my $line = join "\t", (1..1000); print FILE $line, "\n" for (1..600); close FILE; print "File written!\n"; # Original method open PROFILES, "<test.txt" or die $!; my @matrix = (); my $start = time; while (<PROFILES>) { @profile = split (/\t/, $_); while ($j <= $#profile) { $matrix[$j]->[$i] = $profile[$i]; $j++; } $i++; $j = 0; } print "Original method takes ", time-$start, " seconds\n"; close PROFILES; # my method open PROFILES, "<test.txt" or die $!; @matrix = (); $start = time; while(<PROFILES>){ push @matrix, [split"\t"]; } print "My method takes ", time-$start, " seconds\n"; close PROFILES; # Blakem's method open PROFILES, "<test.txt" or die $!; @matrix = (); $start = time; @matrix = map[split],<PROFILES>; print "Blakem's method takes ", time-$start, " seconds\n"; close PROFILES; __DATA__ C:\>perl matrix.pl Writing file... File written! Original method takes 18 seconds My method takes 7 seconds Blakem's method takes 52 seconds

        PS I slightly modified the original code to remove the infinite loop so it actually will work to test it.

        cheers

        tachyon

        s&&rsenoyhcatreve&&&s&n.+t&"$'$`$\"$\&"&ee&&y&srve&&d&&print