in reply to Re: Comparing strings (exact matches) in LARGE numbers FAST
in thread Comparing strings (exact matches) in LARGE numbers FAST


For a mere mortal like myself, the code isn't that simple :-)

Sure, the lookup is very fast, but how about the indexer function encode()?
vec( $bits, $_, 2 ) = $acgt{ substr $string . 'a' x 16, $_, 1 } for 0 .. 15;

Is it worth the effort looping concatenations through substr()'s and performing hash lookup before vec()torizing into $bits, followed by unpack()ing it? This overhead is less noticeable, right? Just curious.

Note: I made the correction for $bits.

Replies are listed 'Best First'.
Re^3: Comparing strings (exact matches) in LARGE numbers FAST
by BrowserUk (Patriarch) on Aug 29, 2008 at 14:34 UTC
    or a mere mortal like myself, the code isn't that simple :-)

    ... looping concatenations through substr()'s and performing hash lookup before vec()torizing into $bits, followed by unpack()ing it?

    Your description seems spot on, so it's not that complicated ;)

    This overhead is less noticeable, right?

    Less noticable than what? See Re^3: Comparing strings (exact matches) in LARGE numbers FAST for the full skinny, but in summary:

    The memory requirement is fixed (512MB) for any number of keys, and the lookup is at least as fast as a native hash which would break the memory of a 32-bit machine with around 50e6 keys.

    If the bug in vec is fixed, it will be significantly faster than a native hash.


    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.