Re^2: threads and multiple filehandles

Replies are listed 'Best First'.
Re^3: threads and multiple filehandles by BrowserUk (Patriarch) on Sep 18, 2006 at 07:52 UTC
For a simple search like this where the processing involved for each line is a single, simple 10-char string comparison (which takes ~2.5e-7 seconds on my 2GHz processor using perl), whereas reading that same 10-char line from a file takes ~2.5e-6, the process is going to be waiting on IO for ~90% of it's time, so the runtime is controlled entirely by the time taken by the kernel to do IO. Whether 1 thread reading from 1 file, or 4 threads reading from 4 files, all the disk access is going to be serialised through the device driver anyway. The only way I've succeeded in deriving any performance benefit from multi-threading a perl process doing IO, is when the processing of the lines can be overlapped with the IO waits. And that was only possible where the processing time was greater than the IO time--complex parsing for example. In that case, it is possible to have a single thread reading to shared (but non-cloned), buffers, and then have 2 or 3 threads reading from the buffers and doing the parsing. Even then, ensuring that the buffers do not get overwritten before the've been processed without ham-stringing the reading thread with locking is a fine tuned balancing act. Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal? "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice.	[reply]

Replies are listed 'Best First'.

Re^3: threads and multiple filehandles
by BrowserUk (Patriarch) on Sep 18, 2006 at 07:52 UTC

For a simple search like this where the processing involved for each line is a single, simple 10-char string comparison (which takes ~2.5e-7 seconds on my 2GHz processor using perl), whereas reading that same 10-char line from a file takes ~2.5e-6, the process is going to be waiting on IO for ~90% of it's time, so the runtime is controlled entirely by the time taken by the kernel to do IO.

Whether 1 thread reading from 1 file, or 4 threads reading from 4 files, all the disk access is going to be serialised through the device driver anyway.

The only way I've succeeded in deriving any performance benefit from multi-threading a perl process doing IO, is when the processing of the lines can be overlapped with the IO waits. And that was only possible where the processing time was greater than the IO time--complex parsing for example.

In that case, it is possible to have a single thread reading to shared (*but non-cloned*), buffers, and then have 2 or 3 threads reading from the buffers and doing the parsing. Even then, ensuring that the buffers do not get overwritten before the've been processed without ham-stringing the reading thread with locking is a fine tuned balancing act.

Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.

Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?

"Science is about questioning the status quo. Questioning authority".

In the absence of evidence, opinion is indistinguishable from prejudice.

[reply]