in reply to Similarity searching
17 comes before 12 or just a typo?
hist:x 1 ## 4 #### 5 #### 17 ########## 12 #
I cannot use any known database engine
Really? Why not?
If true, the first thing I would do is write a short program to do a single pass over your 2e9 silly format files and output a single file formatted like so:
1: 1(3) 3(4) 5(7) 17(1) 21(1) 2: 1(2) 3(2) 17(5) 20(1) 22(2) 3: 3(1) 10(3) 12(1) ...
Then I could dump all those silly format files.
Then I'd look to reformat that single file into some kind of consistent record format, but then I read this bit of your description:
each in real case scenario containing approx 300 columns and there is a maximum of 8000 possible column labels(values)) i thought i should create a consensus histogram from all subject ones. such that this histogram has all 25 columns (now i am again talking about my example) and each column having the maximum number of data points (this is computed from the subject set- if the max number of data points for column 1 is 100 then this how large column 1 in my consensus hist will be.)
And got completely lost in the number of columns and ranges of values for each column...
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^2: Similarity searching
by baxy77bax (Deacon) on Jan 25, 2014 at 16:05 UTC | |
by BrowserUk (Patriarch) on Jan 25, 2014 at 16:19 UTC | |
by baxy77bax (Deacon) on Jan 25, 2014 at 16:37 UTC |