in reply to Parsing Large Text Files For Performance

One thing I'm curious about- Do you guys think that there would be a more efficient way to read in the file, instead of doing it line by line. Perhaps a way to read in one packet at a time? The files are too large for slurping, but is doing it line-by-line the most efficient way?
  • Comment on Re: Parsing Large Text Files For Performance

Replies are listed 'Best First'.
Re^2: Parsing Large Text Files For Performance
by BrowserUk (Patriarch) on Feb 01, 2011 at 00:35 UTC

    In general, reading a file using fixed-sized blocks is somewhat faster than line by line. Especially if the fixed-size is chosen to coincide with the 'natural' read size of the filesystem of the device holding the file.

    This is easily explained by the fact that when reading line by line, the runtime first reads a block and then has to scan that block looking for the end of line character before transferring the appropriate number of bytes to another buffer for return to the calling program.

    But for your application where you want individual lines contain your matching terms, if you read the file block-wise, you'd then have to break up the block by searching for newlines either side of the matches anyway, so the net result would be the same amount of work. But searching for line ends in Perl will usually be slower than letting the system do so in C.


    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
Re^2: Parsing Large Text Files For Performance
by GrandFather (Saint) on Feb 01, 2011 at 00:10 UTC

    Write a benchmark and test it! There are too many variables that may come in to play for us to give a really useful answer without actually trying it on equivalent hardware to the system you are using.

    True laziness is hard work
      I will do that, thanks!