in reply to Re^6: Fuzzy Searching: Optimizing Algorithm ( A few improvements).
in thread Fuzzy Searching: Optimizing Algorithm Selection

FWIW, my solution will lose efficiency with mixed length words.

Well, AFAIUI the efficiency will be determined by the ratio of MIN_KEY_LEN/FUZZ. The smaller it is the less efficient with the degenerate case being a slower version of a bruteforce XOR.

Re: packing the return values; I seriously doubt a pack in pure perl is going to be a net win over just returning multiple elements, so I think that would benefit only the XS solution.

Im happy with either way. And yes the benefit to an XS solution was one of the reasons I didnt do it originally. But i dont entirely agree it a lousy interface. For large numbers of hits and strings it means a lot less string copying is involved and has the inheirent property of being lexicographically sortable, and easy to dupecheck and compare.

As far as returning an index goes, I suppose that's possible, but as is my algorithm has no need to keep the array around.

Ok then we'll leave it as unpacked triplets of ($ofs,$diff,$string) returned via an arrayref.

---
demerphq

  • Comment on Re^7: Fuzzy Searching: Optimizing Algorithm ( A few improvements).