comment on

Your post is a little confusing. In the sample data, you show field_1 as numeric and apparently incrementing by one, but it doesn't start from either zero or one, and in the sample code, you are using this numeric field as the index into an array ($index[$temp[0]]=$temp[1];), then immediately following it you show $index(field 1 value) = field 2 value using parens rather than square brackets.

Is field_1 always numeric?
Are they sequential?
Is the file (or can it be) sorted by this first field in ascending order?
Does Field_1 start from 0 or 1?

If the answer to all these questions is yes, then possibly the easiest solution to the problem would be to use Tie::File. Read the excellent documentation for this module for the full nitty-gritty, but simply stated, it allows you, with a single statement, to treat a file as an array. Once you have tied the array to the file, you can just use the array as if it were entirely in memory and it takes care of caching, flushing, opening & closing it. You can specify how much memory you wish to allocate to the caching of the file and thereby make your own choices about the trade-off between memory use and performance.

The only downside given your file format is that each array element would contain both fields, but it would be a fairly trivial process to modify the module for your own purposes to remove field_1 on the FETCHes and replace it on STOREs.

If not all the answers to the 4 questions above are yes, for example if sequence numbers do not start from 0 or 1, or if the sequences have large gaps, then you would need to make more substantial changes to the module to map the sequence numbers to record numbers, which may be more work than you want to do, but it's worth considering if there is a algorithmic relationship involved.

Examine what is said, not who speaks.

1) When a distinguished but elderly scientist states that something is possible, he is almost certainly right. When he states that something is impossible, he is very probably wrong.
2) The only way of discovering the limits of the possible is to venture a little way past them into the impossible
3) Any sufficiently advanced technology is indistinguishable from magic.
Arthur C. Clarke.

In reply to Re: Very large text file - simple indexing by BrowserUk
in thread Very large text file - simple indexing by Anonymous Monk

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.