in reply to Re^2: Why Coro?
in thread Why Coro?

Yes. That would be a perfect description of Coro's coroutines. Though "coroutines" is better, if for no other reason than it is a term that isn't associated with MS, as fibres (wrongly) are. It's also an older and very well understood term in CS circles.

But I think that is to ignore the political aspect of the Coro pod. Anyone who includes vitriol like this in a module's documentation, is obviously too far gone for rationality.

A great many people seem to be confused about ithreads (for example, Chip Salzenberg called me unintelligent, incapable, stupid and gullible, while in the same mail making rather confused statements about perl ithreads (for example, that memory or files would be shared), showing his lack of understanding of this area - if it is hard to understand for Chip, it is probably not obvious to everybody).

Especially as what Chip Salzenberg said is correct. At least as far as memory and file handles are concerned; they are shared within a single process space. Access is controlled and limited only at the language level; not the kernel or processor level. As for his remarks about the author, I don't know him so I couldn't make comment; but I suspect that anyone with even a cursory understanding of ithreads has long since drawn their own conclusions.

The only lack of understanding regarding ithreads is clearly demonstrated by the author. Though I suspect his "confusion" is, at least in part, political rather than genuine.


Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
RIP an inspiration; A true Folk's Guy

Replies are listed 'Best First'.
Re^4: Why Coro?
by roubi (Hermit) on Jul 29, 2010 at 00:17 UTC
    Anyone who includes vitriol like this in a module's documentation, is obviously too far gone for rationality
    For what it's worth, I found the author to be very much available and helpful for newbie Coro questions on IRC.
Re^4: Why Coro?
by binary (Novice) on Oct 17, 2010 at 23:36 UTC

    Marc Lehmann is in fact spot on and Chip is incorrect. There's no "politics" involved here. At least not on Marc's side.

    Perl's "threads" are not threads in the generally accepted term, yes wikipedia does say "In computer science, a thread of execution is the smallest unit of processing that can be scheduled by an operating system."

    But it also goes on to say

    ", but in most cases, a thread is contained inside a process."

    Perl's implementation of "threads" is actually done using forking and there is *no* shared memory space. You're getting confused as what actually happens is *ALL* the data from the main process is copied across to the memory space of the "threads". They are processes and have their own memory space. Real threads execute inside the 1 process and share memory.

    For interpreted languages there is what's called a GIL(Global Interpreter Lock) which is there to prevent non-thread safe code being shared with other kernel threads. Ruby and Python have a GIL and their threads are known as 'green threads' which are not kernel level but are after the GIL, but still share memory. They of course will not take advantage of multi-cores. On a side note you can get real kernel level threads with both Ruby and Python through jRuby and jPython, but nothing like that exists for Perl.

    Perl's psuedo-threads will take advantage of multiple-cores because they are processes and they do NOT share memory. Marc knows his stuff. He wouldn't have been able to write Ev, AnyEvent, Coro to be as stable and fast as they are if he was lacking in such basic knowledge that even *I* know and I consider myself intermediate at best.

    But, please don't take my word for it. Check perlthrtut. First paragraph.

    "This tutorial describes the use of Perl interpreter threads (sometimes referred to as ithreads) that was first introduced in Perl 5.6.0. In this model, each thread runs in its own Perl interpreter, and any data sharing between threads must be explicit. The user-level interface for ithreads uses the threads class."

    Notice the word 'explicit'. That means you have to, yourself share any data between 'threads'. This is the very purpose of threads::shared. Why would that module exist if Perl 'threads' shared memory as Chip Salzenberg and yourself claim?

      Oh dear. If misreading perlthrtut is the extent of your knowledge, you really shouldn't even reach mental conclusions on the subject, never mind offer them up in writing as proof of your own ignorance.

      How could what you are saying possibly be true, when I and 100s of others run ithreads on Windows everyday. Windows doesn't have a fork api.

      Here, check for yourself. You just might learn something useful. (Like the fact that on Windows Perls, ithreads are used to emulate fork--not the other way around.)

      (Hint: that's not a question, the answer is obviously that it isn't, and couldn't possibly be, true.)

      You (and Marc Lehmann) need to understand the difference between a) a program model; b) the actual implementation that underlies that model.

      Coro isn't "threading". It's good, old-fashioned, cooperative coroutines--just like Windows 3.0; with all the same problems--and nothing more.


      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.
      A reply falls below the community's threshold of quality. You may see it by logging in.

      Threading is an interesting subject. This is what I have learned from peeking at the source code (I think it was perl version 5.12). I also think BrowserUk's treatment is a little harsh but he most likely has good reason to be if he had to program windows before it was pre-emptive :).

      Perl does use kernel threads (ie pthreads, Win32 threads). The code is located at dist/threads.xs inside the perl source tarball. The confusing part is it uses threads to model processes. This is not so confusing when you consider the source of the new perl threads being for fork() emulation on windows. Interestingly, python does use actual lightweight kernel threads for it's green threads. Yet only one thread runs at a time, like you say with the GIL.

      So when Mark Lehmann says his Coro module are the "only real threads" and that perl's threads aren't real threads. Well... defining what a real thread is kind of confusing. Perl threads are "real" kernel threads. In my limited experience they perform like processes and give about the same performance as perl's fork() or python's "multiprocessing" module (which uses python's internal fork()).

      The wikipedia entry mentions the user-level model of threading ("N:1" under "Models") so does this means that Coro's coroutines are indeed threads? They just happen to be user-level threads. Coro's "threads" perform closely to Python or Ruby's "threads" (which are also coroutines, user-level threads).

      I think Coro is really neat and think it's mainly useful when you need to model your program asynchronously with many little workers who share a large amount of data. Perl threads are also really neat when you don't have to share a great deal of data. I think most of the frustration is in poor use of terminology. They are both threads... just different types.

        Well... defining what a real thread is kind of confusing.

        Actually, it's not. A "thread" is a schedulable unit of execution context. Thereby making kernel threads like Windows threads and pthreads--as used by ithreads--real threads. (The 'real' is redundant.)

        It also makes some user-space implementations--such as found in Java 1.1, Erlang, and others--that implement their own internal scheduler, also threads.

        But coroutines are not threads. They are coroutines.

        I think Coro is really neat

        I also think Coro is extremely clever code. And its author, an extremely clever coder. There have even been a few occasions when I have sorely wished that Coro ran on my platform. There is no reason it shouldn't. The basic, underlying longjump mechanism works natively just fine--it is used for exception handling. It's just the implementation that prevents it.

        And my recognition of the author's skills and knowledge are what makes me think that his diatribe in the Coro POD, is neither ignorance nor confusion> But simple politicking of the worst kind. Done in the full knowledge and aforethought of malice, that it is both factually incorrect, and likely to lead some--like binary perhaps--into confusion.

        I think that if there is any real confusion, it comes because Linux treats threads and processes very similarly. To the extent that some versions of top actually list the threads of a single process as if they were separate processes.

        To quote

        Threads of execution, often shortened to threads, are the objects of activity within the process. Each thread includes a unique program counter, process stack, and set of processor registers. The kernel schedules individual threads, not processes. In traditional Unix systems, each process consists of one thread. In modern systems, however, multithreaded programs—those that consist of more than one thread—are common. As you will see later, Linux has a unique implementation of threads: It does not differentiate between threads and processes. To Linux, a thread is just a special kind of process.

        The thing that makes them "special", is that they share address space. Perl's threads also share address space at the C level.

        It is the programming model that ithreads layers on top of those underlying kernel threads, that restricts the access of individual threads within the process, to subsets of the full memory allocated to that process.

        It does this by segregating memory allocations made by different threads, to different segments ("arenas") of the memory allocated to the process. But it is only Perl and the threading model chosen, that enforces this segregation; not the OS. Indeed, the segregation is quite easily defeated.

        The choice of an 'explicitly-shared only' model was a) a concious choice; b) done with very good reason.

        And IMO c) will in the longer term be seen as both inspired, and "the way to go".

        The current implementation lets it down somewhat because of its memory-hungriness, and (lack of) speed. But this could (and hopefully, soon will be) addressed. The main problem with the current implementation is that is uses a 'double-tieing' mechanism for the scalars held in shared aggregate structures.

        That is to say, both the AV or HV of a shared structure, and the individual scalars they contain, have attached magic. This means that not only is the size of every aggregate-held scalar, inflated in size by the attached magic, but also that each thread that has visibility of the shared structure, also requires a--relatively lightweight, but still significant--place-holder or alias object to every scalar held in the shared structure. This is both quite costly--and unnecessary.

        The scalars that live within a tied aggregate don't need to have individually attached magic. (Nor even any physical storage allocation, but that's a twist that we can skip for now.) When a FETCH or STORE is invoked upon a tied array ot hash, the magic attached to the AV or HV has enough information to read or write the actual element without requiring further magic be attached to each individual scalar.

        Not only would the removal (or rather the avoidance of attachment) of magic to the individual scalars considerably lessen the size of the shared aggregates, it would also remove the need for per-thread place-holders for them also. So, each thread would retain a single, lightweight reference to the shared AV or HV, and access it contents through that via it's attached magic, with the result that the memory cost of the shared aggregate is further reduced.

        The final icing on the cake is that indirecting through only one level of magic instead of two would considerably speed up accesses.

        In a nutshell, you can wrap a class around an aggregate with having to make the individual elements of that aggregate objects in their own right. And the memory and performance saving of that are legion. And this could (and will if I ever master the intricacies of XS) be implemented now.

        But none of this detracts from either the desirability of preventing the unintentional, accidental sharing of thread-specific data; nor the usability of the current implementation. Just as with regexes (and every other aspect of Perl, and other languages), implementations can be improved, incrementally over time. Provided that the basic programming model is right.

        And (IMO) the ithreads model is.


        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.