in reply to Re^6: Fuzzy Searching: Optimizing Algorithm ( A few improvements).
in thread Fuzzy Searching: Optimizing Algorithm Selection

Id say if you can speed things up by assuming only a fixed width keyset then do so. However I was intending at some point to convert mine and ysths to a variable width set so it might be worthwhile going both ways. *shrug* For now its safe to assume the search keys are fixed width. :-)

I looked at the optimisation you mentioned regarding moving certain logic outside of the keyloop in your second version. Im not sure if its a good idea to cache those strings, although it will of course speed things up I think it also may be problematic as it drammatically mushrooms the amount of memory your solution needs. For instance with 100_000 keys searching 100k strings you are going to have serious memory issues. So i guess its a tradeoff. I may build a memory ceiling into the test suit so that an object may be at most 400MB or so. While this may be somewhat small its necessary IMO because its around there that my machine will start thrashing and thus blow the utility of any benchmark.

But yeah sure feel free to wait to see the full picture. I just figured youd prefer to get a contender suited up. I have already converted your original solution, and the uncached second solution you posted, and i thought you should have right of reply before i posted them in the new thread.

---
demerphq

  • Comment on Re^7: Fuzzy Searching: Optimizing Algorithm ( A few improvements).

Replies are listed 'Best First'.
Re^8: Fuzzy Searching: Optimizing Algorithm ( A few improvements).
by BrowserUk (Patriarch) on Dec 09, 2004 at 12:41 UTC
    Im not sure if its a good idea to cache those strings, although it will of course speed things up I think it also may be problematic as it drammatically mushrooms the amount of memory your solution needs.

    It's not the keys array I would move out, just the calculations and the $minZeros string, all of which would be constants if the keys are fixed length. I have done that locally and it is worth doing.

    My latest variation is better still, but has a bug in the logic that means it finds a few duplicates (again). Still trying to crack that. Basically, it removes the inner ($offset2) loop, which has a dramatic affect on performance--if only I can get the accuracy back.


    Examine what is said, not who speaks.        The end of an era!
    "But you should never overestimate the ingenuity of the sceptics to come up with a counter-argument." -Myles Allen
    "Think for yourself!" - Abigail        "Time is a poor substitute for thought"--theorbtwo         "Efficiency is intelligent laziness." -David Dunham
    "Memory, processor, disk in that order on the hardware side. Algorithm, algorithm, algorithm on the code side." - tachyon

      It's not the keys array I would move out, just the calculations and the $minZeros string, all of which would be constants if the keys are fixed length. I have done that locally and it is worth doing.

      Yep. I did that (i think :-) once you pointed it out to me. :-)

      ---
      demerphq