Re^2: Parsing Large Text Files For Performance

In general, reading a file using fixed-sized blocks is somewhat faster than line by line. Especially if the fixed-size is chosen to coincide with the 'natural' read size of the filesystem of the device holding the file.

This is easily explained by the fact that when reading line by line, the runtime first reads a block and then has to scan that block looking for the end of line character before transferring the appropriate number of bytes to another buffer for return to the calling program.

But for your application where you want individual lines contain your matching terms, if you read the file block-wise, you'd then have to break up the block by searching for newlines either side of the matches anyway, so the net result would be the same amount of work. But searching for line ends in Perl will usually be slower than letting the system do so in C.

Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.

"Science is about questioning the status quo. Questioning authority".

In the absence of evidence, opinion is indistinguishable from prejudice.

Comment on Re^2: Parsing Large Text Files For Performance