I have been trying to speed up a cpu-bound log file analysis program using threads.... I have isolated the problem down to the following simple example.

You claim your processing is cpu-bound, but your example does nothing except split the lines, which will never take longer than reading it from disc.

I'm going to assume that $PARALLEL = 4; means you have a 4-core system. Well done for limiting yourself to 4 threads.

But, I'm also going to assume that you have only one disk. Trying to read from four separate files (on the same disk) concurrently, will always be slower than reading those same four files sequentially. That's true regardless of whether you use 4 threads or 4 processes or even 4 entirely different computers.

Here's why. Disks have only one read head,and the majority of the cost of disk IO is moving that head to the right place on the disk in order to read the next chunk of data.

When you read the four files sequentially, many chunks of each file will be located contiguously on the same track of the disk, which means that many chunks of the file can be read without moving the disk head. Often, in a single rotation of the disk.

But when you are reading four files concurrently, the disk head will need to be dancing back and forth across the disk in order to read each new chunk. Those continuous and repetitive head movements absolutely kill throughput.

If your processing was truly cpu-bound -- ie. if processing a record took significantly longer than it does to read it, and simply spliting it certainly does not -- then there is some scope for improving throughput through threading.

(Better; more performant) Alternatives include:

Assessment get more complicated if you have NAS or SAN drives using raided SSDs.


Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.

In reply to Re: Reading from file in threaded code is slow by BrowserUk
in thread Reading from file in threaded code is slow by amcglinchy

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.