Beefy Boxes and Bandwidth Generously Provided by pair Networks
Welcome to the Monastery
 
PerlMonks  

Re^3: Parrot, threads & fears for the future.

by Anonymous Monk
on Oct 23, 2006 at 12:48 UTC ( [id://580014]=note: print w/replies, xml ) Need Help??


in reply to Re^2: Parrot, threads & fears for the future.
in thread Parrot, threads & fears for the future.

In wonder, how large must the arrays be for this to be faster in threads than in sequence? Say you split this into four - OS level - threads (because if it's just threads within the same process, you gain nothing at all), hoping it gets evaluated on four CPUs (or cores) in parallel. But that means three out of the four threads will be queued by the OS, and will have to wait their turn to get a slice of CPU time.

Now, if the process is only halfway its current time slice, and has to give up its timeslice because it needs to wait for the other three thread to join, now, that would be a shame.

I'm not saying that build in support for threading is a bad idea. But I do think it's a bad idea to do things in parallel without the programmers explicit permission. And it ain't going to be easy, as you need to cooperate with the OS, and Perl has to run on many OSses.

Replies are listed 'Best First'.
Re^4: Parrot, threads & fears for the future.
by Corion (Patriarch) on Oct 23, 2006 at 12:51 UTC

    I look at thread management much like I now look at memory management. Of course, Perl sacrifices a lot of performance to the reference counting, but it buys a lot of flexibility, convenience and security by using the automatic memory management instead of leaving the programmer to handle the memory, even if that could be more efficient.

      Of course, Perl sacrifices a lot of performance to the reference counting

      I've seen this claim made by the people who refused to reimplement reference counting in Parrot. But the only reference I saw from them that they claimed supported this only said something more like garbage collection really isn't too much terribly slower than reference counting, as far as they noticed.

      Just because you've heard something repeated many times, doesn't mean that there necessarily is any evidence to support it.

      Incrementing and decrementing are two of the fastest things computers know how to do. Adding an increment to a call to malloc() is "in the noise" performance-wise. Certainly for Perl 5, the cost of reference counting to performance is insignificant. And I won't believe that ref counting has much of a performance cost even in a highly-efficient system, until I see some real evidence. Removing an increment assembly op is certainly micro-optimization and it is extremely hard to get micro-optimizations to have much impact on the performance of real code. Your code would have to be doing a ton of taking and removing references and little other real work for the inc/dec to add up to even a non-trivial fraction of the run-time.

      - tye        

        Certainly for Perl 5, the cost of reference counting to performance is insignificant.

        Twelve years after the initial release of Perl 5, there are still reference counting bugs -- and I don't just mean the circular reference problem.

        Incrementing and decrementing are two of the fastest things computers know how to do. Adding an increment to a call to malloc() is "in the noise" performance-wise.

        True, but that's not the only place that reference counting can lose performance compared with a decent GC. You'll usually get much better cache localisation, reduced risk of pipeline hazards, and avoid much of the messy issue of synchronisation of counts across threads.

        I gained about 4% speed with some Lisp-ish code a few years back when we switched from a (admittedly fairly naive) reference count to a decent ephemeral GC. Not order-of-magnitude land - but a respectable speed up. I put most of that down to the GC keeping all the live objects in cache.

        Also, it's not just in the call to malloc you're adding the increment. You're doing an increment every time you add a reference to something, and a decrement every time you remove a reference to something. That happens a lot more often than malloc/frees.

        That said - there isn't a "right" answer. Whether ref counting or some other GC variant is going to get you more speed depends on the code's allocation patterns, the hardware you're running on, etc. too. I just want a decent GC in perl so I can have self-referental structures without having to jump through hoops ;-)

Re^4: Parrot, threads & fears for the future.
by BrowserUk (Patriarch) on Oct 23, 2006 at 20:22 UTC

    Let me speculate a little.

    Imagine a VM-level 'thread' was not a whole interpeter running on a single kernel thread, but instead was a set of VM registers, a VM 'stack' pointer and a VM program counter. And capable of running on any interpreter in the process.

    Now imagine that the VM queried the OS at startup for the number of hyperthreads/cores/CPU (N), and started 1 kernel thread running a separate interpreter in each. And these interpreters all 'listened' to a single, central queue of (pointers to) threads.

    When a threadable opcode is reached, if the runtime size of the data to be manipulated is large enough to warrent it, the opcode splits up the dataset into N chunks and queues N copies of itself plus the partition dataset onto the thread queue ('scatter'); prefixed by a 'gather' opcode. Each available processor pulls a copy off, executes on it's partition of the dataset and stores the results.

    The first interpreter finished, will pull the 'gather' opcode. It decrements an embedded count and waits upon an embedded semaphore. As each of the interpreters finishes and goes back to get the next piece of work, it does the same until the last finishing interpreter decrements th count to zero, freeing the semaphore and clearing the 'gather' opcode. That frees up all the interpreters to pull the next opcode off the queue and continue. Each waiting interpreter is in a wait state allow the other interpreters to continue, consuming no timeslices, preemptively multitasked by the OS.

    The array has been processed, in parallel, by as many threads are available. No user space locking required. No threads needed to be started, nor torn down. Additional opcodes processed by the threaded code pushes are pushed onto the queue (stackwise).

    The same VM thread (register set) forms the basis of user space threading for cooperative multi-tasking like in Java, Ruby and many more. The same thread structure is used for both forms of threading.

    Just a thought (or five:).


    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.

        Thanks. I didn't know that, but it doesn't surprise me. Whilst I came up with the speculation above, without direct reference to other material, it's all fairly obvious. It certainly mirrors aspects of what already happens in hyperthreaded hardware and some aspects of smp hardware and software systems. That GHC allows Pugs to do something similar is unsurprising and good to hear.

        But I'll again remind you that this thread is about Parrot, and ask a couple of (rhetorical) questions:

        • Since this is a pretty obvious idea, that has been shown to work at both the hardware and the software level, then why is it not being used in Parrot?
        • And why, when in very early 2004 the idea came up in the Parrot fora, was it so roundly jumped on and rejected?

        *Rhetorical because I realise that you are not involved in Parrot.


        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://580014]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others rifling through the Monastery: (5)
As of 2024-03-29 11:12 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found