in reply to The Best Hash

This begs an answer on the (filosophical) question: 'how to calculate the difference between hashes'.

Is it the number of different key/value combinations or does one take into account the value of the values as well?

CountZero

"If you have four groups working on a compiler, you'll get a 4-pass compiler." - Conway's Law

Replies are listed 'Best First'.
Re: Re: The Best Hash
by artist (Parson) on Dec 13, 2002 at 19:04 UTC
    Hi CountZero
    Thanks for the attention.
    Little Clarification: The keys and values are coming from mutually independent sets.

    Artist

      If the value is a string value you may be able to use one of the distance modules on CPAN to calculate the closest match in a loop through all of the hashs if numeric you can put your test in a loop with a local var to see what is closest -- though the way you do this totally depends on the data in the key/value pairs. for example if you have an exact match do you need to stop? or do you need to find ALL exact matches? Do you need to find all close matches that are a certain distance from the mean or all the hash key/val pairs or does the match need to match the outside value? More info will get you a better answer here. You cant kill us with information, you can only make the responces more true to your actual situation.

      -Waswas

      Perhaps the following method could be used:

      1. output each hash to a file, sorted alfabetically by key
      2. then run each file against the file of the target hash (or even each other file, if you need to do a full cross control) through a "diff" program, parsing the output with Perl and calculating the difference (e.g. score = number of lines different + number of lines missing + number of lines added)
      3. sorting the "results" from the diffs and finding the hash which is closest to the target hash

      CountZero

      "If you have four groups working on a compiler, you'll get a 4-pass compiler." - Conway's Law