Update: Indeed. The two optimisations reduce the 500,000 comparisons time from ~3.5 seconds to + .5 of a second. That reduces the projected overall runtime of my test scenario from 3+ years to under half a year. Worth doing :)
Yes. I performed the same calculation based upon my decision to use 500,000 25-ers. Hence my decision to ask for clarification.
There are several ways to speed up the processing. Using
( substr( $seq, $offset, 25 ) ^ $25er ) =~ tr[\0][\0];
to do the counting, rather than
grep $_, unpack 'C*, ...
is (probably) another.
I'm just waiting for a limited benchmark I have running to complete before trying several things.
I had thought that by avoiding actually copying each 25 char string out of the sequence I might save some time/memory, but now you've pointed it out, I realise that I can create an LVALUE ref to the substring and avoid 500,000 calls to substr for each inner loop. Thanks.
In reply to Re^3: Fuzzy Searching: Optimizing Algorithm Selection
by BrowserUk
in thread Fuzzy Searching: Optimizing Algorithm Selection
by Itatsumaki
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |