Even if you could achieve a 100:1 reduction in the size of the datastructure required to hold your MegaTrie/DFA (a more realistic figure would be 20:1), then you would still be (at 7GB),So do it in multiple passes, with as many keywords as fit comfortably in memory. As you say, the problem has substantial requirements.nearly an order of magnitude3 1/2 times over budget for the 2GB/process RAM addressable by your average 32-bit processor.
And surely the bulk of the problem is to find which offsets match; identifying the particular strings that match at a given offset, if this is indeed necessary, is much easier and can be a separate step.
In reply to Re^6: Fuzzy Searching:Alg.Sel. (Why tries won't work for this application.)(Yes. Really, really!)
by ysth
in thread Fuzzy Searching: Optimizing Algorithm Selection
by Itatsumaki
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |