Hi there..
I am in the process of building a huge covariance matrix( around 10000 x 10000 elements). Since covariance matrix is a symmetric matrix, I attempt to create only one half of the matrix.
0.22929292
0.32928322, 0.528289202
0.19383838, 0.32892929992, 0.78299283839
0.929283893, 0.4829299299, 0.628299292, 0.929389393
.
.
.
The input the script is an array of comma separated values
0.2939383839, -0.0929288282,0.1293893939, 0.833883929 . . . 250 elemen
+ts
.
.
.
1000 elements
The relevant part of my script :
open TMP, "datafile.txt" || die ("Could not open data file\n") ;
my @input_data = <TMP> ;
close TMP;
my $record_count = 0;
# Loop through the array
for my $element (@input_data) {
chomp;
@first = split(",", $element ) ;
my $jcount = 0;
my @result;
# Loop through the array again
for my $inside_elem (@input_data) {
chomp;
@second = split(",", $inside_elem) ;
# Build only elements below the diagonal
last if ( $jcount++ > $record_count ) ;
# Covariance logic
# No issues with the logic
# sum of products of corresponding elements in the arrays
$sum = 0;
$count = 0;
for (@first) {
$sum += $_ * $second[$count++] ;
}
$sum /= scalar(@first) ;
push @result, $sum ;
}
my $str_to_write = join(", ", @result)."\n" ;
undef @result ;
# Open the output file handler in append mode.
open TMP, ">>Outfile.txt" || die ("Could not open output file ");
print TMP $str_to_write;
close TMP ;
}
The issues with my script.
1. Performance.
It took almost an hour for the first 2500 (out of 10000) records to get generated / written to output file. I badly want to optimize the performance. Can you wise ones give some suggestions?
2. Storage space.
The file size of the output file would be almost a GB (or even more). Sorry if this does not make sense but I was given a suggestion that instead of writing the data as text, if we write it as binary it would save space. I mean, 0.09992020202 would take 14 bytes of space. Instead of character if we write it as float, it should take less space. Is this idea possible to be implemented in Perl ?
Many thanks for your time.
Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
Read Where should I post X? if you're not absolutely sure you're posting in the right place.
Please read these before you post! —
Posts may use any of the Perl Monks Approved HTML tags:
- a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
| |
For: |
|
Use: |
| & | | & |
| < | | < |
| > | | > |
| [ | | [ |
| ] | | ] |
Link using PerlMonks shortcuts! What shortcuts can I use for linking?
See Writeup Formatting Tips and other pages linked from there for more info.