baxy77bax has asked for the wisdom of the Perl Monks concerning the following question:
This is more of a discussion question then a code seeker. First i'll explain the problem on a small case scenario and then provide additional real size input parameters.Problem:
I have a database of histograms like this:
each histogram has up to 5 columns. each column corresponds to a value form 1 to 25. Moreover I'll refer to those histograms as subject hist. On the other hand i have a set of query histograms which is approx. 1000 times smaller then the subject set. What i need to do is: for each query histogram find the closest subject histogram. how do i define "closest"? well closest refers to the number of shared data points (# -> data point) so let say i have histogram x that looks like this:hist:1 1 ### 3 #### 5 ####### 17 # 21 # hist:2 1 ### 5 ## 17 ##### 20 # 22 ## hist:3 3 # 10 ### 12 # ..
thenhist:x 1 ## 4 #### 5 #### 12 # 17 ########## (updated 12 comes before 17)
d(q,c) = 5query 1 ## consensus 2 # -> 1 ## 3 ## 2 ## 3 ## subject 4 ## 5 ## 4 ## -> 6 ## 5 ## 6 #
PS
I cannot use any known database engine
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: Similarity searching
by hdb (Monsignor) on Jan 25, 2014 at 16:27 UTC | |
by baxy77bax (Deacon) on Jan 25, 2014 at 16:53 UTC | |
by tangent (Parson) on Jan 25, 2014 at 18:13 UTC | |
by BrowserUk (Patriarch) on Jan 25, 2014 at 17:24 UTC | |
|
Re: Similarity searching
by BrowserUk (Patriarch) on Jan 25, 2014 at 15:30 UTC | |
by baxy77bax (Deacon) on Jan 25, 2014 at 16:05 UTC | |
by BrowserUk (Patriarch) on Jan 25, 2014 at 16:19 UTC | |
by baxy77bax (Deacon) on Jan 25, 2014 at 16:37 UTC | |
|
Re: Similarity searching
by oiskuu (Hermit) on Jan 25, 2014 at 17:24 UTC |