comment on

Since the data is static,no locking or updating is required, move the existing "build a hash but don't split the fields" code into a separate script that does that, opens a port and listens. This takes around 3 minutes to do a 2.5 GB file containing ~8 million records on my system.

This server script needn't be complicated as all requests will be of the form:

Listen
Read key
Reply with record from memory.
Loop.

In each script that you removed the hash building code, replace it with a call to tie the hash, instead of building it.

Create a Tie::Hash module that only implements the TIEHASH and FETCH methods.

The TIEHASH method connects to the listening port (or starts the new script in the background if the port is unavailable and then connects).

The FETCH method checks it's local cache for the request key and if not found, posts the key to the background script and reads back the record, splits it into fields and caches it locally in a hash as an array (ref).

Now,

The huge file is loaded only once.
The records only get split once upon request, and are thereafter supplied, already split, from local cache.
Your modifications to the existing scripts are confined to the removal of the hash loading code and replacing it with a very simple tied hash. The rest of the code remains unchanged and runs much faster.
If you ever get around to loading the data into a real DB, the tied hash interface can be modifed under the covers to retrieve the information from there and again, the rest of the existing code requires no further modification.

Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.

Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?

"Science is about questioning the status quo. Questioning authority".

In the absence of evidence, opinion is indistinguishable from prejudice.

In reply to Re^5: Moving from hashing to tie-ing. by BrowserUk
in thread Moving from hashing to tie-ing. by eff_i_g

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.