in reply to Fuzzy String matching with index?

This subject has come up before.

My pure perl approach is explained in Re: Fuzzy Searching: Optimizing Algorithm Selection and there is a greatly improved performance version at Re^2: Fuzzy Searching: Optimizing Algorithm ( A few improvements)..

Also of note is ysth & demerphq's compiled XS trie approach the code for which can be found in this Algorithm Showdown: Fuzzy Matching.

Be warned: Both these threads contain ultimately futile, heated technical debate.

I still have, several further iterations of performance improvement of my code kicking around. If you are seriously interested and want to try and take it forward, I could look them out.


Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.

Replies are listed 'Best First'.
Re^2: Fuzzy String matching with index?
by ysth (Canon) on Sep 05, 2006 at 05:32 UTC
    Both these threads contain ultimately futile, heated technical debate.
    Not ultimately futile, in that they ultimately lead demerphq to work on adding trie optimization to perl's regex engine. :)

      When is 5.10 likely to shrug off is mythical status and become manifest?

      Seems I first heard about all the good stuff that was gonna come in 5.10, defined-OR etc. way back in 2002.


      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.

        From a short chat I had with Rafael, the "shopping list" for 5.10 is complete, that is, he isn't waiting for any features to be implemented anymore. I guess there will be a phase where things settle down, but I expect 5.10 within the next 6 months.