in reply to Re^9: Your main event may be another's side-show. (Coro)
in thread Your main event may be another's side-show.

Perhaps you have little experience with programming with real threads ...

You know that is not the case, so why suggest it might be?

... so much of my exposition is foreign to you?

Not at all. I've seen people get themselves in horrible tangles using threads and semaphores and mutexes in C, assembler Java et al. Indeed, I've done it myself way back when.

I've also watched the industry trying to 'fix' the problems with ever more complicated locking & exclusion mechanisms: spin locks, recursive mutexes, ever more elaborate deadlock avoidance schemes, with varied and patchy success. But, as I've said many times, the way to avoid these problems, is to not write code that creates the possibility for them in the first place. And that isn't hard to do, even in C or assembler.

But all this ancient history about a threading in other languages has no bearing on threading as it exists today, in Perl. You might just as well bring up the Pentium bug and cite it as a reason for avoiding floating point math in Perl today.

... the motivation is reducing process overhead ...

Then why use Perl? C carries far less overhead. In assembler you could probably squeeze everything you've ever wanted to do into less than the startup cost of perl(.exe). But it obviously would come at a cost.

We use Perl because, for all but the most demanding of application environments--embedded systems, maybe trading systems and the like--the trade-off of memory versus programmer productivity is utterly worth it. That is no different to accepting that Perl's hashes are more memory hungry than C++ hash_map. Or Perl's arrays are far more memory hungry than C arrays. The trade is worth it.

And so it is with ithreads. I have a four core system. There is rarely ever any point in running more than four concurrent threads. And when there is, it is at most 8 or maybe 16.

Checking the memory consumed by 4 threads: c:\>perl -Mthreads -E"@t= map async(sub{ 1 }),1..4; sleep 100", it comes in at 6.4MB on my 64-bit system.

For 16 threads: perl -Mthreads -E"@t= map async(sub{ 1 }),1..16; sleep 100" it's 12.6MB

Checking a bare perl startup c:\>perl, it is 3.2MB.

So that's between 0.6 & 0.8 MB per thread "overhead". That is a piddling small amount to be worrying about on systems with at least 2GB of memory. And when another 2GB will cost less than half your hourly rate.

Sure, I can make them use more:>perl -Mthreads -E"$x='x'x1e6; @t= map async(sub{ 1 }),1..16; sleep 100" comes in a 2.6MB per thread.

But if I do it this way:perl -Mthreads -E"@t= map async(sub{ 1 }),1..16; $x='x'x1e6; sleep 100" it comes back to just 0.7MB per thread.

Fiddling around trying to intersperse your linear algorithms with sufficient cede points to ensure that they all get a fair shout at the cpu--like trying to manage traffic by adjusting the speeds of all the vehicles so that they miss each other at junctions--for the sake of 1/2 an hours pay? Only to find that 3/4 of your processor cycles are being wasted. That's ... just not the Perlish trade off.

... you missed almost completely the point and have no interest in discussing any of the other "issues".

If by "other issues" you mean Hammer House of Threading Horror stories in other languages, in bygone eras, written with dubious, antiquated coding practices. You're right! I have no interest.

Likewise, if you want to argue that you can do something using Coro that only uses 5 MB of memory compared to an equivalent threaded solution (written by me!) that used 20MB or even 50MB. Then I again have no interest.

If you mean trumpted up, hopelessly stacked, utterly meaningless benchmarks of Coro versus ithreads claiming (and utterly failing) to perform a cpu-intensive, mat-mult algorithm--that BTW, runs 1000 times more quickly (than either) when run as straight-forward linear code. I have no interest.

If you have a real, current case where a Perl program using ithreads has deadlocking issues, or exhibits priority inversion then please post it. I'd be more than please to show you the error of your ways.

Or even if you have a piece of working Coro code (GUIs and comms. servers excluded), that you think either: cannot easily be done better using threads; or will create deadlocking issues if coded using ithreads; then post it and we can compare solutions.

In fact, pretty much anything that is Perl and relates to issues with ithreads I am more than willing--even anxious--to explore. With the proviso(*) that it starts and ends with a comparison of working code that we can all run.

(*)This proviso simply because anything else is just your word versus mine. Your analogy versus mine. Your insults, innuendos and claims versus mine. And we both know where that leads. Exactly nowhere useful.


Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
RIP an inspiration; A true Folk's Guy

Replies are listed 'Best First'.
Re^11: Your main event may be another's side-show. (Coro)
by tye (Sage) on Oct 22, 2010 at 01:01 UTC
    Fiddling around trying to intersperse your linear algorithms with sufficient cede points to ensure that they all get a fair shout at the cpu

    Well, there you misunderstand Coro. No such fiddling is required. You are thinking of your prior experience with cooperative multi-tasking operating systems, not of using cooperative multi-tasking within a process that is part of a modern operating system. And Coro provides a way to do asynchronous handling of blocking operations (mainly I/O) that is even less disruptive to the code even than using things like select.

    The systems I'm talking about are large. The code for them starts out in the 10's of thousands of lines and I don't have permission to post that code.

    The problems with process overhead were not "running 4 or maybe 8" instances of the Perl interpreter but were having banks of dozens of computers dedicated to running many dozens of instances of the Perl interpreter each and then having too many processes idle for too long such that these large computers either ran out of memory or requests backed up (because the systems were properly configured to prevent running out of memory but that just meant that we ran out of available Perl instances).

    Yes, we could probably reduce the number of servers required per unit work by writing in assembler. We don't want to do serious development of server software in assembler and we don't want to have to hire a much larger number of assembly programmers to replace a much smaller number of Perl programmers much less wait much longer for each feature to be ready to deploy (and worse, putting up with the much higher bug density such low-level coding would likely lead to).

    The point of worrying about the process overhead is that it became a significant portion of the resources being used, turned into something that could increase dramatically based on quite tiny changes in response time from external services (such as a database), and was causing the CPUs of the servers to be mostly idle because memory for process overhead would sometimes swamp all other resource requirements several-fold.

    If you can't understand that without a piece of code for you to run and so choose to assume that it is overblown raving, I don't really care.

    The process memory overhead scaled as "number of requests * duration of request". Since, during typical operation, most things happened in small fractions of seconds, the "duration" multiplier was not much of a problem.

    When the simplest of temporary problems can lead to requests to external services averaging 0.8 seconds instead, that shouldn't be a big deal. But it is when that means that each process with its cache of lots of information that makes it so efficient has to go from spending a small percentage of its life waiting to spending most of its life waiting.

    Suddenly the number of required processes is multiplied by 10x or 30x and yet nothing serious is wrong. If the per-process overhead weren't killing things, customer requests would be being handled in 1.8 seconds instead of in 1.0 second.

    The data specific to a given request is tiny compared to the per-process overhead of the interpreter and the data cached hither and yon at multiple layers in the code.

    iThreads manages to share the memory for executable code (including in shared libraries) and for compiled Perl code (the op-node tree or whatever people want to call it). iThreads doesn't share the CPU cycles used to build that cached data. fork() shares everything that iThreads does. fork() also (to some extent) shares read-only data instantiated in the parent (but not well enough) and also shares the CPU used to populate it.

    Coro shares all of the above but also shared the cached data perfectly and also shares the Perl interpreter and only needs a relatively small per-stack set of data that isn't shared. The delta memory requirement per request is a tiny fraction of that required by a large server app using fork() or emulated fork() (iThreads). Creating a new request handler is also a tiny fraction of the amount of work with Coro.

    So Coro means that all of the resources nearly scale relative to the number of requests per unit time, instead of having to scale the lion's share of memory required by "number of requests per unit time" * "length of time a request takes to finish" (plus avoiding the overhead of spinning up more handlers to join the pool).

    The point is not relative consumption of different approaches but how the resource requirements scale.

    comms. servers excluded

    I'm not sure what qualifies under that rubric to you. I would think a SIP router would but it isn't a case where Coro is a big win (because nobody implements a SIP router using a single thread of execution per call). But the vast majority of server code I deal with covering a fairly wide variety of functions might, because most of them can significantly benefit from a coroutines approach.

    If that escape clause means that you are only interested in purely computation-bound operations, then, yes, I don't find scaling such to be even close to as interesting of a problem so have fun with that. It also is almost never the problem I'm facing at work.

    - tye        

      Well, there you misunderstand Coro. No such fiddling is required. You are thinking of your prior experience with cooperative multi-tasking operating systems, not of using cooperative multi-tasking within a process that is part of a modern operating system.

      And there it is. Magic bullet claims, and excuses for why you can't demonstrate it.

      A coroutine program that never yields, (Coro code that doesn't cede), is not cooperative-anything nor multi-anything. It's a single tasking process, and you (well I at least), don't need coroutines to write those!

      The moment you put two coroutines into the same program, (and the clue that this is the norm is is the "Co" part--you can't have co-operation between a single routine), then they have to yield periodically otherwise only one ever does anything.

      I didn't ask you to post megabytes of your work code. But if you have the time to write the above post, you certainly have time to code a simple demonstration of the basic control and dataflows. At least you would if you used threads, but maybe Coro code is so complicated it really would take lots of effort?

      But that is the history of this debate. Always ready with the words, but never the code.

      If that escape clause means that you are only interested in purely computation-bound operations,

      No, it doesn't. It means that I recognise that there are some applications for which threads are not the best option. And large fan-out, autonomous communications servers are one such application. But I also recognise that only a small percentage of applications fit that scenario.

      And that the large majority of applications that come up here, involve a mix of IO-bound and cpu-bound tasks. And that threading accommodates this easily where, event-driven frameworks don't. So, for your average punter here seeking to hive off a little cpu-bound processing whilst remaining responsive to other things; or seeking to cut his runtime by utilising his multiple cores to perform cpu-intensive algorithms on a large dataset, threads are the far simpler option to the often suggested, (but never demo'd), event-driven framework behemoths.

      Why are you, and many like you, so scared of comparing like with like?


      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.
        And there it is. Magic bullet claims

        *sigh* You misunderstand again.

        Why are you, and many like you, so scared of comparing like with like?

        Yes, I'm quite terrified. *plonk*

        Have fun.

        - tye