in reply to Re^2: upper or lower triangular matrix to full
in thread upper or lower triangular matrix to full

You and others in this thread silently assume "matrix" meaning a somehow delimited data file. What if the file looks like this:
0001 1202 3030 ... 8491 9382 9381 ...
In such a fixed lenght case you don't need any memory (well, kinda) and can just do the task by seek()ing the appropriate positions on disk.

We won't know unless the OP tells us.


holli

You can lead your users to water, but alas, you cannot drown them.

Replies are listed 'Best First'.
Re^4: upper or lower triangular matrix to full
by choroba (Cardinal) on Sep 02, 2017 at 08:49 UTC
    I originally started with
    sub fill_matrix { my ($in) = @_; open my $IN, '<', $in or die $!; my @index = (0); push @index, tell $IN while <$IN>; pop @index; for my $line_no (0 .. $#index) { print STDERR "$line_no\r"; for my $idx (0 .. $line_no - 1) { seek $IN, $index[$idx], 0; my $line = <$IN>; print +(split ' ', $line, $line_no + 2)[$line_no], ' '; } seek $IN, $index[$line_no], 0; my $line = <$IN>; print +(split ' ', $line, $line_no + 1)[-1]; } }

    but it was much slower: 28s for SIZE 1000, 280s for SIZE 2000.

    ($q=q:Sq=~/;[c](.)(.)/;chr(-||-|5+lengthSq)`"S|oS2"`map{chr |+ord }map{substrSq`S_+|`|}3E|-|`7**2-3:)=~y+S|`+$1,++print+eval$q,q,a,
      If the fields were fixed-width, you wouldn't need to read an entire line to get a single value. That really starts to bite you when the lines get long. But the seeking is still going to kill performance once you run out of disk cache.