in reply to Threads Doubt

If your machine does not have multiple processors, then all your threads will be sharing the time on the same processor, so there is no advantage. But, by using multiple threads, you incur the overhead of the threading support itself, so a net loss in throughput.

However, even if you have multiple processors, if your files all reside on the same drive, then by using multiple threads, you are causing the read heads to jump around all over the disk in order to try and supply the separate threads with data, and you are again incurring overhead not present with the single-threaded process.

The only way you will see benefit from threading this kind of IO-bound processing, is if you have multiple processors, and can arrange for files being read and/or written to reside on different, local disks. And note: different physical drives, not different logical partitions of the same drive. Even then, the spitting of the system filecache between different concurrent files is likely to hit the throughput more than any gains you might achieve.


Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
"Too many [] have been sedated by an oppressive environment of political correctness and risk aversion."

Replies are listed 'Best First'.
Re^2: Threads Doubt
by Illuminatus (Curate) on Oct 17, 2008 at 16:04 UTC
    For large files like this, the read heads will still be "jumping around" anyway, because unless you just de-fragged your disk, the file is likely to be physically spread out. Even putting files on separate disks may not improve performance all that much, because your actual IO rate depends on many things:
    1. Individual disk performance. RPMs, local buffer, etc
    2. RAID? Mirroring can typically support twice the read performance
    3. Controller type. SCSI is better for multi-tasking than EIDE. However SCSI supports lots more devices, so if it has 7 devices all accessed simultaneously, you're no better off
    4. Bus speed and contention.
    5. (Most important) Application contention. What else on your system is trying to use the same disks? Are they shared?
    In general, your best bet for IO performance is to make sure that when you can read, read as much as you can.
      For large files like this, the read heads will still be "jumping around" anyway, because unless you just de-fragged your disk, the file is likely to be physically spread out.

      Yes I know. But, if you are trying to read from 5 files concurrently, your read heads are going to be jumping around far more than if you are only reading from one file. (All else being equal.)

      And depending upon your OS and filing system, 5 concurrent readers means far less system cache devoted to each file, which will further decrease throughtput. Then there are factors such as on disk caching and myriad other hardware and software related factors.

      But as a cojent, if simplified, explanation of why multithreading can have a negative affect on the throughput of the OPs application, I think my post stands on it's own.


      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.
        No offense intended, BrowserUK. My (rather roundabout) point was that even putting files on different disks might not make a difference.

        You are right to raise warning flags concerning the urge to multi-thread, for many reasons. I humbly submit that, given large reads and non-trivial data processing, it is possible to design a 2-thread solution that is likely to improve performance (1 thread reading a 10MB chunk while the other is processing its 10MB chunk from memory).