typo! they are always ordered.
"If true, the first thing I would do is write a short program to do a single pass over your 2e9 silly format files and output a single file formatted like so:"
yes it is already in some "nicely" formated style but for the purposes of visualization i used histograms (i thought it vould be easier to understand the problem).
"And got completely lost in the number of columns and ranges of values for each column..."
so each histogram can have up to 300 columns. let say each column label is a number, then 300 does not imply that columns are labeled from 1 to 300 but the label range is 1-8000. from those 8000 labels each histogram has at most 300 different labels. (# of different ways you can pick 300 out of 8000 at most) the size of the column does not have a maximum value.
i hope i clarified the problem a bit.
cheers
In reply to Re^2: Similarity searching
by baxy77bax
in thread Similarity searching
by baxy77bax
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |