There are some monks that seem to believe that thread support is somehow important for perl, I could never see a need for it, especially when the arguments goes "Threads are faster then processes", which is true only on Win32. And supposedly it's true because of braindead implementation of process that Win32 uses.

The point is mostly moot, because perl has a very weak support for threaded programs, and there were multiple problems with perl's implementation in the past ( thus one might like to avoid using those features until the stage stabilises ), additionally few distrbutions wisely avoided shipping threaded perl recently, few tried being on the edge, and ended up shipping edgy software...

However, does perl really need threads? Would you use them if they were available in stable form? Is current implementation solid enough?

I think few recent developments like hyperthreading, SMT and SUN's Niagara, which puts ~8 threads on single CPU, make threading support somewhat necessary, other then that, traditionall process model seems more efficient, and definitelly easier to debug.

Woody: This is perl, v5.6.1 built for i386-linux Sarge: This is perl, v5.8.4 built for i386-linux-thread-multi

update

Most replies assume that the only way to use multiple CPUs is by using threads. AFAIK threads make sense only in 2~4 CPUs range, when you add more processors you start running into trouble, what about keeping cache in sync on dozens of CPUS? Since everything is shared, you need to keep everything in sync, which is very costly.

The solution (to the "let's use threads exclusively" problem ) is to build a CPU that uses multiple threads with single shared cache.

IBM's SMT and P4's HT does that, and Niagara does that in a very radical way. You don't want threaded programs on your multi-CPU machine (like 8-core AMD64?).

If you like multi-CPU machines, then you would want to write your programs with as little sharing as possible, you would explicitely marked the areas you want shared, and try to keep this area as small as possible, and that's the state-of-the-art as far as I know.

Has the state of the art changed?

Replies are listed 'Best First'.
Re: Threading vs perl
by spurperl (Priest) on Jun 20, 2005 at 10:25 UTC
    Are you (*ahem*) serious, brother monk ?

    Why does any language need threads ? Your are either against threads at all or not, because if threads are needed in C/C++/Java/Python/Lisp/whatever, they're needed in Perl as well.

    Ever had to write a GUI that does background processing in a sane way ? Ever had to interface to synchronous IO, and in the meantime stay responsive / do other things ? Ever had to speed up computations by splitting disk/IO access and the core algorithms ?

    The current implementation of threads in Perl 5.8.1+ is stable enough, I think, and I personally used it in an application where a solution w/o threads or with processes would be slower, and much more cumbersome. I hope for even more improvements in the threading modules of Perl, of course.

      Once upon a time I was working at a different company, and we had a real-time process that generated statistics.
      We had 2 classes of hardware, one windows based, and one Dec Unix based.
      The Windows based systems (NT) had support for threads, so we used a back end program to watch the socket for messages, collect the messages, and periodically push the messages to the Informix db, in 2 threads.
      The Dec Unix system didn't support threads, so we used multi-processes with IPC. I think it was 3.

      They both did the same thing, but we worked within the limitations of what we had at the time.

      I guess the point is that both approaches worked, and were relatively efficient. But for my part, the multi process approach was much easier to work with and figure out what was happening when something went wrong. The multi thread approach took me longer to get working well. But that might have been because it was on Windows. I haven't done any multithreading on any other platform.

      For example java needs threads because it can't (couldn't) do asynchronous IO, Thus, by your logic, perl needs no threads? yet some monks can't live without'em, why?

      Background processing should work on different set of data then the GUI itself, thus, I would think, that forking off the process to do the job would be the best.

      Languages are not created equal, you seem to be suggesting to keep feature count high, I don't know why, to look ok on comparision charts?

      Does perl need continuations? I would sure like them, even instead of threads.

        Perl can do asynchronous IO, but it still needs threads. Nothing to claim the opposite "by my logic".

        Asynchronous IO is great, but in some applications it's not enough, and threads are better. This becomes especially useful in applications with GUIs. You can use Tk's "after" utility to simulate multitasking, but it ain't pretty. Using threads is much, much better, and far more comfortable.

        Threads make it very easy and quick to share data between units of execution, unlike processes. When you have a manager-workers model that requires a lot of data to be passed around quickly, threads come in very useful.

        Continuations won't come instead of threads. While they make "cooperative execution" possible, it's not what they really are for, as far as I understand. You still have to decide when to give up control, and with many paths of execution it's important that this is done quickly and efficiently.

        forking off the process to do the job would be the best.

        Do you realize that forking is very much different from threading? Fork actually copy the memory from one process to another, causing more memory use. A forked proccess can only comunicate with the parent process with IPC and shared memort. With threads the memory *is* shared, and you don't need to handle that.

        The question is that there are uses for fork and for threads, and using threads when forking is better is as bad as using fork when threading is better. Do you want an example?

        First: Implementing a networking application that needs to continuosly read and continuosly write from sockets. It can be done with fork, but it would be much better (from an OS view) to do with threads.

        Second: JBOSS uses too much threads and too few forks. The result is that I can easily see it eating 900 Mb of RAM beause of some jobs that uses more memory, if it did fork, these procesess that eat memory would run in a forked procecss and the memory would be given back to the Operating System.

        I think there is no way to say "Why do I need threads, since I have fork?"... It's the same to say "Why do I need a car, since I have a stomach?"...

        daniel
Re: Threading vs perl
by zentara (Cardinal) on Jun 20, 2005 at 11:01 UTC
    My rule of thumb is :

    If I need to share data between processes use threads, if not fork-and-exec.

    Threads make it very easy to share data, while fork-execing can be a real hassle when you need communication between the separate processes. Sure you can use named pipes or sockets, but threads are easier to setup.


    I'm not really a human, but I play one on earth. flash japh

      That makes lots of sense, however when I use perl, very rarely do I encounter problems that require multiple workers working on the same data (and usually I solved such problems with IPC::SharedCache)

      Maybe there are some large areas of computing as-of-yet unknown to me, where such problems are common?

        Ok, just because you rarely encounter problems that require multiple workers working on the same data, doesn't mean the rest of us rarely encounter those problems. My main project at work would be greatly helped by threads that worked across Windows, Unix, and Linux. But I don't have them, so I've left it in non-threaded mode.

        IPC::SharedCache works great when there's only a little bit of caching. I'm not sure it's great for handling multiple huge trees of objects, from a cursory glance at the CPAN doc.

        Any large structure(s) of data where different subsections can be worked on independantly, but the results must be fed back to the main process would benefit from threads. One common area is XML processing - working on each tree in a different worker thread is often possible and even desirable, and then you'll want to amalgamate the results in a final structure so that you can save a new XML file, for example, which cannot easily be done by multiple forked processes due to the close-tags that need to line up properly.

        Yeah, it's very rare, that is why you seldom see threaded code. In GUI programming, it becomes more common. The case that pops up the most, is displaying a progressbar of some long-running activity, like a download. In a single threaded app, the long-activity may block the gui from updating, and it is very convenient to put the activity in a thread. Of course, we all use tools which are most intuitive to us, so use what works for you. :-)

        I'm not really a human, but I play one on earth. flash japh
Re: Threading vs perl
by hardburn (Abbot) on Jun 20, 2005 at 12:31 UTC

    And supposedly it's true because of braindead implementation of process that Win32 uses.

    True, Windows needs a good thread model because its process model sucks. In contrast, Linux needs a good process model because its thread model sucks. So take your pick.

    Good support for any multi-processing technique is needed because CPUs are moving twards a model with a slower core, but with many cores on a single die.

    "There is no shame in being self-taught, only in not trying to learn in the first place." -- Atrus, Myst: The Book of D'ni.

      As far as win32/un*x comparisions go, I've seen benchmarks according to which linux process were faster then win32 threads.

      With threads==lighter processes on un*x you get maybe 0.5% improvement, while programming/debugging becomes way harder (maybe not for java addepts, who are raised with 'threads are god/good' mentality).

      Thus, I think, efficency is not a good reason for such strange feature, locality of data, maybe, wider availability of Niagara-like processors, maybe.

      And with perl threading stabilising ( for me one such sign is new debian stable shipping threaded perl by default ), we should be ready for technologies like SMT.

Re: Threading vs perl
by Old_Gray_Bear (Bishop) on Jun 20, 2005 at 16:43 UTC
    The Unix-world got by for years with out a serious threads implementation because the underlying hardware was a uni-processor, and 'threaded' was merely another synonym for 'time-sliced'. That world-view changed with the advent of the graphics-support processor, to offload the image rendering computations from the CPU. The world is changing again with the appearance of (relatively) cheap multi-processors. Once you have real hardware support for parallel execution streams, you _have_ to have some way of handling the synchronization/communications problem. If your problem is partitionable, with no inter-process interaction, you can throw multiple uni-processors at it. If it isn't partitionable, intra-process communications based on threads is manditory.

    Consider your average large main-frame based organization, say My Bank, with over 5 Tbytes spinning and a 1% per month growth rate. Each second they have to support between five and fifteen thousand simultaneous transacations; each transaction can generate between 10 and 500 DB calls (mostly read disk accesses, but). Asynchrous I/O is the only way to go here, otherwise you get a end-to-end response time that is measured in fortnights.

    Once upon a long time ago: ASP and JES3.... (Bring Back The Splat!)

    ----
    I Go Back to Sleep, Now.

    OGB

      I'm sorry, but the Un*x-world I've known has always been multi-processor, and the last decade it started turning into massively-multi-processor, so I don't quite follow you there

      Desktop PCs, those used to be typically uni-processor, but you would be hard pressed to find un*x-based workstation without at least two CPUs (or at least additional socket for one).

      When is intra-process communications based on threads manditory.? I was taught that with multi-CPU machines, threads are disadvantegous (because all of synchronisation necessary, when you move you process around, for example to another CPU, you need to keep it's memory ( cache ) synced, instead of keeping in sync only the data that the worker actually IS working on).

      When did IPC::SysV, IPC or IPC::Msg drop out of favor?

        You were taught correctly. For which reason good schedulers go out of their way to keep threads in a process on a single CPU.

        In fact SMP has a fundamental major flaw, which is that it does not scale. As you add more CPUs, each CPU spends a larger portion of its time waiting for all CPUs to come to attention so that one can do something which needs to avoid a race with any other. Eventually the useful work done by the next CPU is less than the amount of time it causes others to waste. This is the computer version of an organization where everyone spends all of their time in meetings. By breaking things into many smaller locks, and making sure that each CPU checks in more frequently, you can improve this. However this adds overhead, and you get diminishing returns. The best that I know of offhand is 128, and the last 64 CPUs don't add much.

        To really scale you need to move to NUMA (Non-Uniform Memory Architecture). In this architecture CPUs organize into groups, so when one needs others to synchronize it only needs some other CPUs to synchronize, reducing this locking considerably. This is the computer version of moving from a perfectly flat to a hierarchical organization.

Re: Threading vs perl
by Akhasha (Scribe) on Jun 21, 2005 at 08:42 UTC
    However, does perl really need threads?
    Yes.

    Would you use them if they were available in stable form?
    Certainly.

    Is current implementation solid enough?
    No. Well OK just maybe, but as I understand it spawning a new thread means copying all memory structures (kinda like a fork), which means growth in memory use. (Even with copy on write, because you have to spend a page to toggle a bit) The advice I've read is to spawn all your threads at startup time when the memory footprint is low, and that gives me an icky feeling about the whole thing. I hope Perl6 will include Java-like threads (only good).
      The advice I've read is to spawn all your threads at startup time when the memory footprint is low ...

      You can also spawn a thread factory early in your program before you load any modules. You can then load (require) the modules needed by your main thread, and when you need to spawn a thread, have the thread factory do it for you.

      Each new thread created by the thread factory will be a duplicate of the first thread created with it's low(ish) memory footprint, rather than a duplicate of the main thread as would be the case if your spawned it from there.

      With this technique, you can more cheaply create new threads, that only load what they individually need, as you need them rather than having to start them all at the beginning.

      It is a shame the lengths you have to go to defeat the rediculous default behaviour :(

      I hope Perl6 will include Java-like threads.

      I hope that Perl6 will have both User threads (Java-like) and Kernel threads (Perl5-like iThreads--'cept without the enforced duplication).

      Spawning an iThread should just start the function I pass it in a new interpreter running in a new system thread, and leave me to decide what to load there.

      The User thread scheduler would be loaded as a module into one or more Ithreads and be scheduled cooperatively.


      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
      "Science is about questioning the status quo. Questioning authority".
      The "good enough" maybe good enough for the now, and perfection maybe unobtainable, but that should not preclude us from striving for perfection, when time, circumstance or desire allow.

      I wholeheartedly agree with you.

      Although, the previous replies seemed to galvanize around the whole windows vs unix/linux debate with varying degrees knowledge.

      Jason L. Froebe

      Team Sybase member

      No one has seen what you have seen, and until that happens, we're all going to think that you're nuts. - Jack O'Neil, Stargate SG-1

Re: Threading vs perl
by DrHyde (Prior) on Jun 21, 2005 at 08:30 UTC
    If threading were stable I'd probably use it, if only because it would save me from having to fight the POE lack-of-documentation, or decrypting the docs for select().
Re: Threading vs perl
by ph713 (Pilgrim) on Jun 24, 2005 at 21:06 UTC
    I have recently been through the process of coding two rather large private perl projects where I really, really, *wanted* to use threads from perl, because a threaded model mapped very well to my problem space. So let me tell you what I found out during those projects, and what I wish for:

    1) Perl ithreads are only just starting to stabilize. If you expect your installed base to include Bob's Old Redhat Box with perl 5.8.0 (or worse, 5.6.x on it), just forget about threads right now and save yourself a lot of greif.

    2) perldoc perlthrtut has a section that concludes with a very important statement: "Perl Threads Are Different.". All I really wanted was POSIX-style threads. I was prepared to properly deal with locking and mutexes and the whole nine yards. My Operating System supports POSIX-style threads in C, but perl's threading is a completely different beast. I'm sure it was done this way in some hope of having nearly-identical threading behavior across a wide array of underlying threads implementations, but I have to conclude that it sucks.

    At many times during the 2nd project, I seriously considered just starting over in C and building up some libraries and macros to make the code nearly as compact as perl, and using some standard regex library to make up for that part. In the end the compromise I ended up making (in both cases) was to use a multi-process model and "share" data via freeze/thaw over the top of various IPC mechanisms (sysv shm, semaphores, disk files w/ locks - at one point I even invented my own perl message queueing module to handle inter-process message-passing with disk-persistence).

    The whole thing left a bad taste in my mouth. My projects are considerably less efficient at what they do than they would have been if ithreads had turned out to be what I hoped it was - a mirror of the underlying posix threads layers.

    What do I want out of perl threading?

    I want all data to be shared by default when a thread spawns from another.

    I want to be forced to explicitly declare thread-local storage.

    If you're going to force me to explicitly share data, I want shared data to be transparent and completely functional. Doing share($x), and finding that $x->{foo}->{bar}->[3]->{zip} is not in fact shared, sucks.

    I want POSIX threading, exposed through perl, to operate basically just like it does in C.

    Is it so much to ask for?

      All I really wanted was POSIX-style threads. I was prepared to properly deal with locking and mutexes and the whole nine yards. My Operating System supports POSIX-style threads in C, but perl's threading is a completely different beast.

      You mention using 5.6.1. The threading model exposed in 5.6.1 is almost exactly what you say you want. A thin veneer over the POSIX threads API. Everything shared by default. Full access to all of the POSIX locking and mutex functions.

      My suggestion to you, is to try writing one or two mildly complex applications using 5.6.1. This is the only way that you will see and understand the difficulties involved in trying to translate the techniques that you would employ using this api at the C level into programs that also have to contend with the realities of Perl's "fat" internal datastructures and inherently non-reentrant core apis.

      Only then will you understand that you don't want what you think you want.


      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
      "Science is about questioning the status quo. Questioning authority".
      The "good enough" maybe good enough for the now, and perfection maybe unobtainable, but that should not preclude us from striving for perfection, when time, circumstance or desire allow.
      Precisely. So well stated, that I don't think I could add a damn thing. One of those instances when I wish that there were a mechanism to convert 10 votes into a +2 or 40 votes into a +3.
      ------------ :Wq Not an editor command: Wq