comment on

If you can spend the time (and the I/O) for a complete read before you start to use the file, you do not actually need to create a complete index. You could have an index that indexes every 10 (or 100) lines, so that you approximately know where you are reading, so you only have to read a small number of useless lines before the one you are actually looking for. This might be a viable solution.

If your lines have a line number that's univocally understandable, you could use a binary partition method to find your line in about log2(n) passes without using an index. You should:

Get the total size of the file
Read the first complete line that's about the half of the file
If it's over the line you are looking for, repeat the process seeking the middle point between the middle of the file and the end of the file;
if it's below, repeat seeking the middle point between 0 and the middle of the file
if it's it, you return it and exit

You have to take care that if the line you are looking for does not exist in the file, you cannot loop indefinitely (maybe set a maximum number of steps before bailing out).

If the line length of the lines within the files is expected to be more or less even, you could also think about a prediction method to get the expected point. It goes like this:

We read a line in the middle of the file
We now know it's line number (say, 10038) and that it's located at offset 120046
We expect the average line to be 11.9 chars long
If we are looking for line 9500, we start seeking at offset 9500*11.9 = 113050 and the we adjust our estimate based on what we actually find

Of course all those things have to be thought of seeing the actual case.

In reply to Re: Re: Re: Fast way to read from file by l3nz
in thread Fast way to read from file by Hena

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.