I agree with Roger that creating an index to the lines is the fastest way to randomly access the lines in a file.

Unfortunately, unless the lines are fairly long, say >30 chars, an array of line positions, takes just as much memory as storing the whole file in an array. And if the file is really large, then your almost as likely to run out of memory just storing the line offsets as you are storing the lines themselves.

An alternative is to store the offsets as binary values in a single scalar using pack. This requires only 4MB to store the offsets for a million line file, which would require 60MB to store the same information as an array.

A simple sub using unpack, substr & seek can then be used to read lines by lineno quickly and efficiently.

open IN, '<test1000000.dat' or die $!; $offsets = pack 'V', 0; $offsets .= pack 'V', tell IN while <IN>; print length $offsets; 4000008 sub readline_n{ my( $fh, $line) = @_; seek $fh, unpack( 'V', substr( $offsets, --$line*4, 4 )), 0; scalar <$fh> } print readline_n( \*IN, 500000 ); 500000

Examine what is said, not who speaks.
"Efficiency is intelligent laziness." -David Dunham
"Think for yourself!" - Abigail
Hooray!
Wanted!


In reply to Re: Fast way to read from file by BrowserUk
in thread Fast way to read from file by Hena

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.