demerphq You are wrong.

How long were your "words"? Less than 25 characters?

For the two-miss scenario, matching each of 100,000 25-character needles against each of 30,000 x 1000-char strings requires:

326 * 100,000 * 976 * 30,000 comparisons = 954,528,000,000,000 comparisons.

Your 30,000,000 * 1275 = 38,250,000,000 comparisons.

Your rate of comparisons is 21,250,000/second.

Your runtime to run the 100,000 x 30,000 x 1000 is:

1 year, 5 months, 4 days, 21 hours, 29 minutes, 24.7 seconds.

For the 3-miss scenario the numbers are:

2626 * 100,000 * 976 * 30,000 = 7,688,928,000,000,000.

Your runtime to run the 100,000 x 30,000 x 1000 is:

11 years, 5 months, 20 days, 2 hours, 51 minutes, 45.88 seconds.

For the 4-miss scenario the numbers are:

28,252 * 100,000 * 976 * 30,000 = 82,721,856,000,000,000.

Your runtime to run the 100,000 x 30,000 x 1000 is:

123 years, 4 months, 9 days, 17 hours, 27 minutes, 3.46 seconds.

As for your challenge. I have published my code. Where is yours? Try applying your method to real world data.

If you have somewhere I can post the 1000 x 1000-char randomly generated sequences (994 K) and the 1000 x 25-char randomly generated search strings (27 kb) then I will send them to you so that you can time the process of producing the required information of:

Sequence no/ offset/ fuzziness (number of mismatched characters) that my published test code produces in 28.5 minutes.

Then, and only then, when we are comparing eggs with eggs, will there be any point in continuing this discussion.


Examine what is said, not who speaks.
"Efficiency is intelligent laziness." -David Dunham
"Think for yourself!" - Abigail        "Time is a poor substitute for thought"--theorbtwo
"Memory, processor, disk in that order on the hardware side. Algorithm, algorithm, algorithm on the code side." - tachyon

In reply to Re^3: Fuzzy Searching: Optimizing Algorithm Selection by BrowserUk
in thread Fuzzy Searching: Optimizing Algorithm Selection by Itatsumaki

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.