in reply to Re^2: Rapid text searches ( O(1) space and time)
in thread Rapid text searches

it's a straight key lookup, not a real search like yours; but that's what I took the OP to want/need.

I think the OP just needs to find the appropriate records in a timely fashion; whether that's done by indexing the file, or searching the file doesn't really matter. Though the hour to build the index could be critical if the file changes with any frequency. If say, it's the output from some other process or system.

A rather more important difference (assuming I'm reading your code and console log correctly), is that your 0.016 - 0.019 seconds is the time taken to lookup one value.

My timings:

c:\test>768941 768941.search >nul Found 1000 records in 0.032 seconds c:\test>768941 768941.search >nul Found 1000 records in 0.020 seconds c:\test>768941 768941.search >nul Found 1000 records in 0.036 seconds c:\test>768941 768941.search >nul Found 1000 records in 0.020 seconds c:\test>768941 768941.search >nul Found 1000 records in 0.023 seconds

are the time to find 1000 values!

given Y they solve for X where X is a more interesting problem than Y.

Given 1000 keys, he wanted to find the corresponding values in a file containing ~38e6 pairings as quickly as possible. That's all my code does. I don't understand why you think I'm doing something different?


Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
"Too many [] have been sedated by an oppressive environment of political correctness and risk aversion."

Replies are listed 'Best First'.
Re^4: Rapid text searches ( O(1) space and time)
by Your Mother (Archbishop) on Jun 06, 2009 at 15:59 UTC
    I don't understand why you think I'm doing something different?

    Oh, I don't really. The OP said "hash" and "single entry" so I felt like a Berkeley was the right answer (if the up front cost of building it is acceptable). Your solution is more flexible and really cool. I just thought my suggestion wasn't a "sledgehammer" but possibly a perfect fit. :)

    Update: I'll see if I can post a 1,000 searches match too; I have to rebuild the thing so it might be later.

    Update to the update: it was about 9 times slower to do 1,000 look-ups, clocking in at .15 for the most part (1.42 GHz G4 Mac).