in reply to Re: Why use threads over processes, or why use processes over threads?
in thread Why use threads over processes, or why use processes over threads?

This is because a thread should consist of little more than a scheduler object containing a set of registers, a stack segment and some scheduler administration state. Unlike a process, there should be no need to copy large amounts of memory, as threads should be able to re-use the existing process' copy of memory.

You could change a few words and the same would apply to forking a process on Linux and, I believe, *BSD. (I also believe it fits to most modern Unix variants these days, but I've only read the source for Linux.)

  • Comment on Re: Re: Why use threads over processes, or why use processes over threads?

Replies are listed 'Best First'.
Re: Re: Re: Why use threads over processes, or why use processes over threads?
by BrowserUk (Patriarch) on Nov 11, 2003 at 20:48 UTC

    I'm not familiar with linux or most other unixes, but there would have to be at least the replication of filesystem objects -- duping of open file handles, sockets etc. and associated state.

    I would imagine that it also requires the creation of handles to the existing memory objects in order to handle COW etc.

    In addition, every write, which on the evidence of Abigail above, can frequently mean a perl-level read, will result in a memory copy operation (though I'm not sure what the granularity is). There will also be some amount of overhead associated with detecting writes to shred memory segments. Whether this is a software or hardware interupt, the effect upon L1 and L2 caches etc. can be expensive too.

    It's unclear to me how forking handles other shared handles like DB connections, hardware connections to tape drives, serial ports and the like, but I think that it is probably down to the user to handle this rather than fork.

    None of these things is individually expensive, but the convenience of spawning a thread, without requiring any of this is considerable. The greatest use, and the greatest benefit from threads is for performing asynchronous reads (from whatever). This use is simply not possible with forks. The select model just doesn't compare for usability, and event-driven models require you to throw away even standard structured programming techniques, never mind object-oriented models and revert to relying upon global state.

    Finally, the benefits of co-routines are totally absent from the forking model, but are almost trivial to implement using threads.

    I don't see threads and processes as an either/or proposition. In an ideal world, the programmer would have both spanners in his toolkit, and would be free to choose whichever is appropriate for the task at hand. For some tasks one is appropriate, for others, the other. In some cases a mixture of the two makes perfect sense, if the underlying system supports both efficiently. The best choice will sometimes be dictated by the underlying system.


    Examine what is said, not who speaks.
    "Efficiency is intelligent laziness." -David Dunham
    "Think for yourself!" - Abigail
    Hooray!
    Wanted!

      The greatest use, and the greatest benefit from threads is for performing asynchronous reads (from whatever). This use is simply not possible with forks.

      I'm sure I must just be misunderstanding your point. Could you please expand on what you meant by this (with an example)?

        Giving an example, in perl, would be counter-productive as it would invite comparisons, that would only serve to highlight the shortcomings of the current implementation of perl's threading.

        In essence, in filter type applications, reading from a file, performing some processing, and then wrting to another file, much of the time is spent waiting on the kernel to complete IO. Throughput can be vastly improved by having a read thread, a processing thread and a write thread. Written correctly, this allows the processing thread to run at full speed, overlapping the processing with the IO.

        This processing model only works if the three threads can share the buffer space for input and output. Forking doesn't work for this as you then need to use IPC to communicate the data between the 3 processes, and instead of the processing thread having to wait on the reads and writes, it has to wait on the IPC. You've just moved the goalposts, not removed the waiting.

        Unfortunately, using the current implementation, even using pre-spawned threads, the underlying duplication/replication in perl's shared memory model, combined with the course granularity of the semaphoring, don't allow this model to be coded as efficiently, hence I won't provide sample code.


        Examine what is said, not who speaks.
        "Efficiency is intelligent laziness." -David Dunham
        "Think for yourself!" - Abigail
        Hooray!
        Wanted!

Re^3: Why use threads over processes, or why use processes over threads?
by Aristotle (Chancellor) on Nov 11, 2003 at 19:39 UTC

    Of course, the big difference between threads and processes remains the fact that the latter need their personal page tables. There is probably quite a bit of room for optimization of this process on the MMU design level left though. (Why recreate the entire page table set? COW could be applied there as well.)

    This is not going to happen overnight, but I'm certain that at some point, the effective overhead of processes over threads will be zero. It is very close already, but not there yet.

    Makeshifts last the longest.