in reply to Re^5: Fuzzy Searching:Alg.Sel. (Why tries won't work for this application.)(Yes. Really, really!)
in thread Fuzzy Searching: Optimizing Algorithm Selection

You havent proved anything here. (Except that you dont understand how my solution works, the list you built meant my code was trying to match all _4_ digit fuzzy strings, and not the _2_ digits we were originally discussing, this presumably is where you get the misconception that it wont handle 25 digit keys.) The behaviour you have posted here is exactly as I predicted. And its correct. As I stated in my earlier post and as ysth pointed out as well its trivial to determine what keys also match if a given key matches. So you havent proved anything here. I could have written the code so that instead of outputing only the literal string matched it would output a precalculated list of all the possible variants. Which actually brings me to yet another mathematical problem with your code. Your list omits the 30 words that have only 1 digit different.

Point in fact you say my idea doesnt work. So I posted working code. Then you say its broken, when in fact it does exactly what it was advertised to do. Face it BrowserUk you are and were wrong. Your XOR approach is slow, so slow as to be useless for nontrivial searches. The state machine is fast, and where it has an up-front overhead can be reused over and over. No matter what you do you arent going to get around that. Sorry.

I suggest you drop this, all you are doing is embarrasing yourself now.

---
demerphq

  • Comment on Re^6: Fuzzy Searching:Alg.Sel. (Why tries won't work for this application.)(Sigh)

Replies are listed 'Best First'.
Re^7: Fuzzy Searching:Alg.Sel. (Why tries won't work for this application.)(Sigh)
by BrowserUk (Patriarch) on Nov 29, 2004 at 11:49 UTC
    ...the list you built meant my code was trying to match all _4_ digit fuzzy strings, and not the _2_ digits we were originally discussing...
    01234567890 AAAAAAAAAA AAAAAAAAAAA ========== Offset 0 -- 0 mismatches. 01234567890 AAAAAAAAAA AAAAAAAAAAA ========== Offset 1 -- 0 mismatches. 01234567890 CCAAAAAAAA AAAAAAAAAAA xx======== Offset 0 -- 2 mismatches. 12345678901 CCAAAAAAAA AAAAAAAAAAA xx======== Offset 1 -- 2 mismatches. ## 806 other 2-mismatch matches your code fails to find 12345678901 AAAAAAAATT AAAAAAAAAAA ========xx Offset 0 -- 2 mismatches. 12345678901 AAAAAAAATT AAAAAAAAAAA ========xx Offset 1 -- 2 mismatches.

    QED. 2 not 4.

    Perhaps you should try this

    Update: And don't you realise that the list of 4 character mismatches would include all of the 2-char mismatches? And the 1-char mismatches? And the 3-char mismatches? And you code found what? Just TWO exact matches...pull the other one.


    Examine what is said, not who speaks.
    "But you should never overestimate the ingenuity of the sceptics to come up with a counter-argument." -Myles Allen
    "Think for yourself!" - Abigail        "Time is a poor substitute for thought"--theorbtwo         "Efficiency is intelligent laziness." -David Dunham
    "Memory, processor, disk in that order on the hardware side. Algorithm, algorithm, algorithm on the code side." - tachyon