If you can spend the time (and the I/O) for a complete read before you start to use the file, you do not actually need to create a complete index. You could have an index that indexes every 10 (or 100) lines, so that you
approximately know where you are reading, so you only have to read a small number of useless lines before the one you are actually looking for. This might be a viable solution.
If your lines have a line number that's univocally understandable, you could use a binary partition method to find your line in about log2(n) passes without using an index. You should:
- Get the total size of the file
- Read the first complete line that's about the half of the file
- If it's over the line you are looking for, repeat the process seeking the middle point between the middle of the file and the end of the file;
- if it's below, repeat seeking the middle point between 0 and the middle of the file
- if it's it, you return it and exit
You have to take care that if the line you are looking for does not exist in the file, you cannot loop indefinitely (maybe set a maximum number of steps before bailing out).
If the line length of the lines within the files is expected to be more or less even, you could also think about a prediction method to get the expected point. It goes like this:
- We read a line in the middle of the file
- We now know it's line number (say, 10038) and that it's located at offset 120046
- We expect the average line to be 11.9 chars long
- If we are looking for line 9500, we start seeking at offset 9500*11.9 = 113050 and the we adjust our estimate based on what we actually find
Of course all those things have to be thought of seeing the actual case.
Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
Read Where should I post X? if you're not absolutely sure you're posting in the right place.
Please read these before you post! —
Posts may use any of the Perl Monks Approved HTML tags:
- a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
| |
For: |
|
Use: |
| & | | & |
| < | | < |
| > | | > |
| [ | | [ |
| ] | | ] |
Link using PerlMonks shortcuts! What shortcuts can I use for linking?
See Writeup Formatting Tips and other pages linked from there for more info.