in reply to Reading from file in threaded code is slow

I have been trying to speed up a cpu-bound log file analysis program using threads.... I have isolated the problem down to the following simple example.

You claim your processing is cpu-bound, but your example does nothing except split the lines, which will never take longer than reading it from disc.

I'm going to assume that $PARALLEL = 4; means you have a 4-core system. Well done for limiting yourself to 4 threads.

But, I'm also going to assume that you have only one disk. Trying to read from four separate files (on the same disk) concurrently, will always be slower than reading those same four files sequentially. That's true regardless of whether you use 4 threads or 4 processes or even 4 entirely different computers.

Here's why. Disks have only one read head,and the majority of the cost of disk IO is moving that head to the right place on the disk in order to read the next chunk of data.

When you read the four files sequentially, many chunks of each file will be located contiguously on the same track of the disk, which means that many chunks of the file can be read without moving the disk head. Often, in a single rotation of the disk.

But when you are reading four files concurrently, the disk head will need to be dancing back and forth across the disk in order to read each new chunk. Those continuous and repetitive head movements absolutely kill throughput.

If your processing was truly cpu-bound -- ie. if processing a record took significantly longer than it does to read it, and simply spliting it certainly does not -- then there is some scope for improving throughput through threading.

(Better; more performant) Alternatives include:

Assessment get more complicated if you have NAS or SAN drives using raided SSDs.


Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.