in reply to Re^2: Fuzzy Searching: Optimizing Algorithm Selection
in thread Fuzzy Searching: Optimizing Algorithm Selection

Update: Indeed. The two optimisations reduce the 500,000 comparisons time from ~3.5 seconds to + .5 of a second. That reduces the projected overall runtime of my test scenario from 3+ years to under half a year. Worth doing :)

Yes. I performed the same calculation based upon my decision to use 500,000 25-ers. Hence my decision to ask for clarification.

There are several ways to speed up the processing. Using

( substr( $seq, $offset, 25 ) ^ $25er ) =~ tr[\0][\0];

to do the counting, rather than

grep $_, unpack 'C*, ...

is (probably) another.

I'm just waiting for a limited benchmark I have running to complete before trying several things.

I had thought that by avoiding actually copying each 25 char string out of the sequence I might save some time/memory, but now you've pointed it out, I realise that I can create an LVALUE ref to the substring and avoid 500,000 calls to substr for each inner loop. Thanks.


Examine what is said, not who speaks.
"Efficiency is intelligent laziness." -David Dunham
"Think for yourself!" - Abigail
"Memory, processor, disk in that order on the hardware side. Algorithm, algorithm, algorithm on the code side." - tachyon