in reply to upper or lower triangular matrix to full

savita:

As others have noted, you're talking about a rather large dataset. I'd suggest not even bothering to reflect the data around the diagonal, and instead manipulate your indices to fetch the data:

$ cat pm_1198523.pl #!env perl use strict; use warnings; my $rData = [ [ 1, 3, 5, 7, 9 ], [ 0, 2, 4, 6, 8 ], [ 0, 0, 3, 6, 9 ], [ 0, 0, 0, 8, 4 ], [ 0, 0, 0, 0, 3 ] ]; bless $rData, 'foo'; print $rData->fetch(1,4), "\n"; print $rData->fetch(4,1), "\n"; print $rData->fetch(3,1), "\n"; print $rData->fetch(1,3), "\n"; package foo; sub fetch { my ($self, $r, $c) = @_; ($r, $c) = ($c, $r) if $c < $r; return $self->[$r][$c]; } $ perl pm_1198523.pl 8 8 6 6

Also, if you're building the file containing the data yourself, I'd suggest making all your data elements the same size so you can access the data directly from the file via seek.

If you'd like to reduce the size of your data file, and you're building it with fixed length elements, then you don't even need to store the lower "NA" triangle: instead you can use the idea above to fetch the data you're wanting by swapping the indices, and compute the seek address directly to compress out the lower triangle, like this:

$ cat pm_1198523_2.pl #!env perl use strict; use warnings; my ($R, $C, $sz); # Row, Column and element size $sz = 5; for my $r ([0,0], [1,0], [1,1], [2,0], [2,1], [2,2], [12,5]) { ($R,$C) = @$r; my $seek_addr = seek_addr($R, $C, $sz); print "[$R,$C] => $seek_addr\n"; } sub seek_addr { my ($R, $C, $sz) = @_; ($R,$C) = ($C,$R) if $R < $C; my $slot = $R*($R+1)/2 + $C; return $sz * $slot; } $ perl pm_1198523_2.pl [0,0] => 0 [1,0] => 5 [1,1] => 10 [2,0] => 15 [2,1] => 20 [2,2] => 25 [12,5] => 415

Note: You may want to adjust things depending on how you plan to use the data. Random access of the items is likely to be excruciatingly slow as other monks have mentioned. If you can arrange your algorithm so that you primarily iterate through columns or rows then you can make the disk cache work with you rather than against you. (You may have to swap R/C above depending on your access pattern.)

...roboticus

When your only tool is a hammer, all problems look like your thumb.

Replies are listed 'Best First'.
Re^2: upper or lower triangular matrix to full
by savita (Novice) on Sep 04, 2017 at 15:06 UTC
    Thanks very much for your detailed response! Yes, I am building the file containing the data myself, so I don't need to store the lower triangle of NAs, as you have pointed out. This will be input to a program (not written by me), so I am not entirely sure how the items will be accessed- I will check with the author, and see if I can order the matrix in a way that the program can read speedily! Thanks again for your helpful advice.