Re^2: Huge data file and looping best practices

@talex, will mySQL, for example, store the data more compactly, or allow for faster analysis? We're not concerned with subsets of data other than to break up the computation tasks into several chunks, across processors or computers. We really need all the data. Plus, we don't have the SQL skills to do the analysis that way.

Comment on Re^2: Huge data file and looping best practices

Replies are listed 'Best First'.
Re^3: Huge data file and looping best practices by talexb (Chancellor) on Apr 26, 2009 at 17:26 UTC
A database may not be the best solution here -- from reading the other posts, it could be that you're going to be more interested in 'clumping' each of the data points together, creating 'neighborhoods' of 'nearest neighbors'. My Systems Design professor Ed Jernigan did research along those lines. Perhaps a first cut would be some sort of encoding of each data point, then a 'clumping' based on that, with further analysis on the smaller 'clumps'. Alex / talexb / Toronto "Groklaw is the open-source mentality applied to legal research" ~ Linus Torvalds	[reply]

Replies are listed 'Best First'.

Re^3: Huge data file and looping best practices
by talexb (Chancellor) on Apr 26, 2009 at 17:26 UTC

A database may not be the best solution here -- from reading the other posts, it could be that you're going to be more interested in 'clumping' each of the data points together, creating 'neighborhoods' of 'nearest neighbors'. My Systems Design professor Ed Jernigan did research along those lines.

Perhaps a first cut would be some sort of encoding of each data point, then a 'clumping' based on that, with further analysis on the smaller 'clumps'.

Alex / talexb / Toronto

"Groklaw is the open-source mentality applied to legal research" ~ Linus Torvalds

[reply]