Sorry to labour the point, but having cast around for references I can't see how parallelizing an IO-bound process could ever reap performance benefits on "commodity hardware"--defined for the purposes of discussion as a single cpu system with a single harddrive.

If one process is IO-bound, then by definition, it spends most of it's time waiting for the OS kernel to complete it's IO requests. Ie. The route through the kernel IO routines, device driver and disk drive hardware is the limiting factor.

If you start a second copy of the process, then it will spend it's time waiting for it's IO requests to complete, but it will also have to wait for the IO chain to complete the first processes IO requests before it gets around to attempting to service those from the second process.

The bottleneck, wherever in the IO chain it falls, will not suddenly widen because a second process starts making requests. The only way that could happen is if the kernel held some percentage of it's potential throughput 'in reserve' for second and subsequent processes.

That's not a completely rediculous idea. Certainly some network protocols do something akin to this. SNA for example would resist allocating all the bandwidth of any given point to point link to a single end-to-end connection and would attempt to always hold some bandwidth in reserve for low-volume high priority traffic. Years ago, I remember breaking 1.4MB diskette images into smaller chunks for transmission across SNA networks as smaller transfers were always given priority over larger ones. But I've never heard, nor found reference to any dd or bus management system that does anything similar.


Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.

In reply to Re^8: Algorithm advice sought for seaching through GB's of text (email) files by BrowserUk
in thread Algorithm advice sought for seaching through GB's of text (email) files by chargrill

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.