in reply to Re: Threads Doubt
in thread Threads Doubt

For large files like this, the read heads will still be "jumping around" anyway, because unless you just de-fragged your disk, the file is likely to be physically spread out. Even putting files on separate disks may not improve performance all that much, because your actual IO rate depends on many things:
  1. Individual disk performance. RPMs, local buffer, etc
  2. RAID? Mirroring can typically support twice the read performance
  3. Controller type. SCSI is better for multi-tasking than EIDE. However SCSI supports lots more devices, so if it has 7 devices all accessed simultaneously, you're no better off
  4. Bus speed and contention.
  5. (Most important) Application contention. What else on your system is trying to use the same disks? Are they shared?
In general, your best bet for IO performance is to make sure that when you can read, read as much as you can.

Replies are listed 'Best First'.
Re^3: Threads Doubt
by BrowserUk (Patriarch) on Oct 17, 2008 at 17:25 UTC
    For large files like this, the read heads will still be "jumping around" anyway, because unless you just de-fragged your disk, the file is likely to be physically spread out.

    Yes I know. But, if you are trying to read from 5 files concurrently, your read heads are going to be jumping around far more than if you are only reading from one file. (All else being equal.)

    And depending upon your OS and filing system, 5 concurrent readers means far less system cache devoted to each file, which will further decrease throughtput. Then there are factors such as on disk caching and myriad other hardware and software related factors.

    But as a cojent, if simplified, explanation of why multithreading can have a negative affect on the throughput of the OPs application, I think my post stands on it's own.


    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
      No offense intended, BrowserUK. My (rather roundabout) point was that even putting files on different disks might not make a difference.

      You are right to raise warning flags concerning the urge to multi-thread, for many reasons. I humbly submit that, given large reads and non-trivial data processing, it is possible to design a 2-thread solution that is likely to improve performance (1 thread reading a 10MB chunk while the other is processing its 10MB chunk from memory).

        No offense intended,

        None taken :)

        given large reads and non-trivial data processing, it is possible to design a 2-thread solution that is likely to improve performance

        I've had several attempt at this using Perl and found that unless the processing is really quite involved, the cost of iThreads scalar sharing negates any advantage. I also found it actually works best using smaller buffers: 64kb to 1MB.

        I've had far greater success using C and threads where memory is shared directly without the threads::shared overheads, but once you drop into C, using memory mapped files or overlapped IO is far more productive.


        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.