in reply to Re^2: Fuzzy Searching: Optimizing Algorithm Selection
in thread Fuzzy Searching: Optimizing Algorithm Selection
Update: Indeed. The two optimisations reduce the 500,000 comparisons time from ~3.5 seconds to + .5 of a second. That reduces the projected overall runtime of my test scenario from 3+ years to under half a year. Worth doing :)
Yes. I performed the same calculation based upon my decision to use 500,000 25-ers. Hence my decision to ask for clarification.
There are several ways to speed up the processing. Using
( substr( $seq, $offset, 25 ) ^ $25er ) =~ tr[\0][\0];
to do the counting, rather than
grep $_, unpack 'C*, ...
is (probably) another.
I'm just waiting for a limited benchmark I have running to complete before trying several things.
I had thought that by avoiding actually copying each 25 char string out of the sequence I might save some time/memory, but now you've pointed it out, I realise that I can create an LVALUE ref to the substring and avoid 500,000 calls to substr for each inner loop. Thanks.
|
|---|