comment on

I have an application that needs to compute a lot of correlations (pearson and spearman) and speed has become quite an issue. I've reduced the problem to the simplest possible form:

for($i = 0; $i < @data; $i++){
    for($j = $i + 1; $j < @data; $j++){
        $pears = sum($data[$i] * $data[$j]) / $samples;
    }
}
[download]

@data contains several tens of thousands of piddles with precomputed standard scores; $samples is the number of values in each piddle (side note: I would've thought that should be '$samples - 1', but this gives the same result as all the public implementations I've tried).

So my question is: if I were to reimplement this part in C, would I expect some sort of significant speedup? I'm looking for a very general ballpark, like 2-3x, or "don't bother, probably none at all".

Also I was wondering if stuffing the whole thing into a 2D piddle, rather than keeping it in a Perl array might offer some benefit? I'm inclined to think that probably not, but you never know. (memory is not an issue at all here, just execution time).

What I have right now is probably fast enough to make the project feasable - the largest dataset I have would take around 12 hours; but I have hundreds of these datasets, and even though I have the hardware to do quite a few in parallel and most are much much smaller than the largest, reducing the run-time even a little at this stage might save quite a bit of time by the time I'm done.

Any thoughts? Or other ideas I haven't thought of?

In reply to PDL vs C speed question by glwtta

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.