Anyway 9 seconds are not that much..
See how it works out in the real thing.
especially if I start with your algo...
index isn't my algo, just a poor, pure perl substitute for it.
If you look closely at the inline C version, longCmp() is not your average brute force string search. It uses several short circuits to avoid unnecessary comparisons.
For example, it checks the last byte in the haystack up front.
When longCmp() encounter this scenario:
haystack:....0ACGTACGTACGT0.... needle : ACGTACGc
The fact that the last bytes to be matched are different means it doesn't have to compare all the intermediates to discover than and can move forward to the next position.
That is, when longCmp() tries this comparison:
haystack:....0ACGTACGTACGT0.... needle : AACCGGTT
It sees that the last byte that would be compared is a null, and not only skips that comparison, but skips ahead to the start of the next string:
haystack:....0ACGTACGTACGT0.... needle : AACCGGTT
It isn't possible to code this kind of logic in perl efficiently, hence the inline C solution. My pure perl version was simply a poor substitute until the OP can sort out his Inline::C install.
In reply to Re^15: list of unique strings, also eliminating matching substrings
by BrowserUk
in thread list of unique strings, also eliminating matching substrings
by lindsay_grey
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |