Hi there..
I am in the process of building a huge covariance matrix( around 10000 x 10000 elements). Since covariance matrix is a symmetric matrix, I attempt to create only one half of the matrix.
0.22929292 0.32928322, 0.528289202 0.19383838, 0.32892929992, 0.78299283839 0.929283893, 0.4829299299, 0.628299292, 0.929389393 . . .
The input the script is an array of comma separated values
0.2939383839, -0.0929288282,0.1293893939, 0.833883929 . . . 250 elemen +ts . . . 1000 elements
The relevant part of my script :
open TMP, "datafile.txt" || die ("Could not open data file\n") ; my @input_data = <TMP> ; close TMP; my $record_count = 0; # Loop through the array for my $element (@input_data) { chomp; @first = split(",", $element ) ; my $jcount = 0; my @result; # Loop through the array again for my $inside_elem (@input_data) { chomp; @second = split(",", $inside_elem) ; # Build only elements below the diagonal last if ( $jcount++ > $record_count ) ; # Covariance logic # No issues with the logic # sum of products of corresponding elements in the arrays $sum = 0; $count = 0; for (@first) { $sum += $_ * $second[$count++] ; } $sum /= scalar(@first) ; push @result, $sum ; } my $str_to_write = join(", ", @result)."\n" ; undef @result ; # Open the output file handler in append mode. open TMP, ">>Outfile.txt" || die ("Could not open output file "); print TMP $str_to_write; close TMP ; }
The issues with my script.
1. Performance.
It took almost an hour for the first 2500 (out of 10000) records to get generated / written to output file. I badly want to optimize the performance. Can you wise ones give some suggestions?

2. Storage space.
The file size of the output file would be almost a GB (or even more). Sorry if this does not make sense but I was given a suggestion that instead of writing the data as text, if we write it as binary it would save space. I mean, 0.09992020202 would take 14 bytes of space. Instead of character if we write it as float, it should take less space. Is this idea possible to be implemented in Perl ?

Many thanks for your time.

In reply to Issue on covariance calculation by Mandrake

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.