The problem is that unless you read the file and count record terminators (usually '\n'), you can't possibly know how many bytes into the file a particular line is. ...unless each line is of uniform length, in which case simple math and seek is all it takes. But the fact is that there is no master index, by default, of a file telling Perl or any other program what offset within the file each line starts at.

Imagine the scenario of someone telling you "Take the third right turn after the stop light." Then you get in your car, and decide, "I'll bet the third right past the stop light is in exactly 2.3 miles."... having never looked at a map and never driven the road before. If you blindly turn right at 2.3 miles, you're going to end up running into a house or something, because you cannot possibly know the exact mileage to that third right turn until you've driven to it at least once, and having done so, taken notice of the mileage.

So there's the rub. If you want to find a particular point in a file, but you don't know exactly where that point is going to be, you're going to have to skim through the file until you find it. If you're lucky enough to have a situation where the file's future modifications are within your control, you should be able to at least document where that third right turn is found, and keep your "index" up to date if the position ever changes.

This isn't a problem unique to Perl. It's not even a problem unique to computers. Right now, without looking at any table of contents, find me the first page of chapter three in the book To Kill a Mockingbird. You can't find it without physically skimming through the book.

To answer the second part of your question... If you use File::Readbackwards to find that position within the file where the regexp is located, you can use tell to ascertain where, within the file, you actually are. tell gives an absolute location, not related to things like newlines or delimiters. That location can be used later by seek to set your next read/write position within a file.


Dave


In reply to Re: Handling large files by davido
in thread Handling large files by tsk1979

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.