Your math is all wrong.

Your 30,000,000 * 1275 = 38,250,000,000 comparisons.

I used an 11 digit key to generate my fuzzy list of 1275 words (i took a phone book and picked a number and generated all the one and two digit variants then searched against the book, the book was just under 35,000,000 numbers). Since the number of comparisons in a lookup into a Trie is the same as number of characters in the matched key the number of comparisons at absolute top used by the Trie based algorithm is thus 11 * N where N is the number of words in the dictionary. Thus the absolute maximum number of comparisons to search a 35 million word dictionary is 385 million character comparisons, in practice it will be signifgiantly less because of early bailouts.

Oh another cool thing about the trie approach is that it could do fuzzy matches on a whole host of strings simultaneously for exactly the same work. So the op said he needs to run a large set of these matches. He could generate his fuzzy list for each word walk the dictionary and have matched the full set in one go.

All of this assumes the thing i got wrong in the first place: that the match is start point anchored. I wasnt paying attention, and when i realized when I did my resubmit preview I added the bit starting "gah". Since i thought the ideas raised were interesting, and the qualifier present I didn't think it worth editing further. Maybe next time ill put in huge blinking letters "OOPS -- I ANSWERED THE WRONG QUESTION" so its more clear.

---
demerphq


In reply to Re^4: Fuzzy Searching: Optimizing Algorithm Selection by demerphq
in thread Fuzzy Searching: Optimizing Algorithm Selection by Itatsumaki

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.