artist has asked for the wisdom of the Perl Monks concerning the following question:
Thanks,
Artist
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: The Best Hash
by CountZero (Bishop) on Dec 13, 2002 at 18:47 UTC | |
This begs an answer on the (filosophical) question: 'how to calculate the difference between hashes'. Is it the number of different key/value combinations or does one take into account the value of the values as well? CountZero "If you have four groups working on a compiler, you'll get a 4-pass compiler." - Conway's Law | [reply] |
by artist (Parson) on Dec 13, 2002 at 19:04 UTC | |
Thanks for the attention. Little Clarification: The keys and values are coming from mutually independent sets. Artist | [reply] |
by waswas-fng (Curate) on Dec 13, 2002 at 19:41 UTC | |
-Waswas | [reply] |
by CountZero (Bishop) on Dec 14, 2002 at 08:59 UTC | |
Perhaps the following method could be used: CountZero "If you have four groups working on a compiler, you'll get a 4-pass compiler." - Conway's Law | [reply] |
|
Re: The Best Hash
by BrowserUk (Patriarch) on Dec 14, 2002 at 07:59 UTC | |
How I can decide which hash is the least differing from other hashes? If I understand you correctly, you want to measure the 'fit' of each ordered set of hash values against every other set of hash values and accumulate their scores and sort them via that criteria. For this purpose a (A,B,C,D) -v- (A,B,D,C) scores 2, and (A,B,C,D) -v- (B,A,D,C) scores 0. This is an O(n^2) problem. Add to this the cost of comparing two arrays of strings and counting the matches and you have a slow and expensive process. Those little diagrams above gave me an idea. If you XOR a number with itself, you get zero. Perl (uniquely?) can do bitwise operations on strings. If you XOR 2 strings and they are the same, you end up with a string of null bytes. A slow method of comparing two string, and no good at all for comparing arrays of strings, even if you concatenated the arrays into a single string unless all the string were the same length. However, you said that the set of values in each hash is the same, its only the ordering that distinguishes them. Therefore it is possible to represent each set of values by a string with each value represented by a single character (provided that there are less than 255 unique values in each hash). Once you have the sets represented this way, deriving the 'fit' of any pair of hashes comes down to XORing 2 substitute strings and counting the number of null bytes. Doing this for all the pairs and accumulating the scores determines a fit value of each hash relative to every other. You then need to order these fits whilst retaining the index for which hash they belong to. You can then use the print them out in order of relative fit. The code below does this for 50 hashes of 9 values each in a few seconds. You don't say how many hashes or how large your sets, but it may complete in a reasonable time if your numbers aren't too extreme. The whole process only takes 6 (fairly complex) lines:)
test data Read more... (3 kB)
Results Read more... (3 kB)
I hope the results mean something to you, cos I can't see the pattern:) Examine what is said, not who speaks. | [reply] [d/l] [select] |
|
Re: The Best Hash
by dakkar (Hermit) on Dec 13, 2002 at 19:44 UTC | |
You say you have the same keys and the same values in each hash, right? Meaning that doing: will result in @v1 etc. being each a permutation of the others? If this is the case, I'd use the minimum number of swaps necessary to permute one hash into the other. Keep in mind that differences are usually defined between two objects. What do you mean with 'least differing from other hashes'? --
dakkar - Mobilis in mobile
| [reply] [d/l] |
|
Re: The Best Hash
by artist (Parson) on Dec 13, 2002 at 21:16 UTC | |
Some more answers to assist my question.
Dakkar:
Mojotoad:
Waswas-fng:
More Notes:
Thanks, | [reply] |
by graff (Chancellor) on Dec 14, 2002 at 02:07 UTC | |
It sounds like you might actually be trying to cluster the hashes based on relative similarity to each other -- or maybe you just want to identify the two hashes that are most similar to each other. Anyway, it might be interesting to treat the key-value pairs as strings, print the contents of each hash as a sorted list of these strings, then do pair-wise diffs on the text files. The pair with the fewest diffs is the closest match. The following has not been tested, but perhaps it will be useful.
(update: fixed to read $$hash_ref{$_} in the subroutine) | [reply] [d/l] |
|
Re: The Best Hash
by mojotoad (Monsignor) on Dec 13, 2002 at 20:04 UTC | |
This might help refine your question: Marijuana Varieties from weedguru.com. Further resources: Marijuana FAQ, MFAQ from hempfiles.com
;) | [reply] |
|
Re: The Best Hash
by Anonymous Monk on Dec 17, 2002 at 02:28 UTC | |
| [reply] |