Re^2: Moving from hashing to tie-ing.

Replies are listed 'Best First'.
Re^3: Moving from hashing to tie-ing. by BrowserUk (Patriarch) on Jul 31, 2006 at 16:50 UTC
Is the content of the file static or does the processing involve updates? Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal? "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice.	[reply] [d/l]
Re^4: Moving from hashing to tie-ing. by eff_i_g (Curate) on Jul 31, 2006 at 17:06 UTC
Static. Throughout all of the processing nothing is changed, only a new file is created.	[reply]
Re^5: Moving from hashing to tie-ing. by BrowserUk (Patriarch) on Aug 03, 2006 at 07:28 UTC
Since the data is static,no locking or updating is required, move the existing "build a hash but don't split the fields" code into a separate script that does that, opens a port and listens. This takes around 3 minutes to do a 2.5 GB file containing ~8 million records on my system. This server script needn't be complicated as all requests will be of the form: Listen Read key Reply with record from memory. Loop. In each script that you removed the hash building code, replace it with a call to tie the hash, instead of building it. Create a Tie::Hash module that only implements the `TIEHASH` and `FETCH` methods. The TIEHASH method connects to the listening port (or starts the new script in the background if the port is unavailable and then connects). The FETCH method checks it's local cache for the request key and if not found, posts the key to the background script and reads back the record, splits it into fields and caches it locally in a hash as an array (ref). Now, The huge file is loaded only once. The records only get split once upon request, and are thereafter supplied, already split, from local cache. Your modifications to the existing scripts are confined to the removal of the hash loading code and replacing it with a very simple tied hash. The rest of the code remains unchanged and runs much faster. If you ever get around to loading the data into a real DB, the tied hash interface can be modifed under the covers to retrieve the information from there and again, the rest of the existing code requires no further modification. Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal? "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice.	[reply] [d/l] [select]