Beefy Boxes and Bandwidth Generously Provided by pair Networks
P is for Practical
 
PerlMonks  

Re^4: Multi-thread database script (threads)

by jplindstrom (Monsignor)
on Apr 25, 2007 at 13:47 UTC ( [id://612013]=note: print w/replies, xml ) Need Help??


in reply to Re^3: Multi-thread database script (threads)
in thread Multi-thread database script

See also: forks

/J

  • Comment on Re^4: Multi-thread database script (threads)

Replies are listed 'Best First'.
Re^5: Multi-thread database script (threads)
by BrowserUk (Patriarch) on Apr 25, 2007 at 13:54 UTC

      So you've used the referenced module, forks ? Your use of the plural and my expectations both lead me to suspect that you misunderstood the node you replied to and didn't follow the offered link.

      On a platform with real fork support, I'd expect the technique outlined in forks to be more memory efficient and, at least in some respects, faster than Perl threads (and not terribly slower in any respect), and perhaps less buggy. Note that this has everything to do with how poorly Perl's particular implementation of threading is, not a problem with threading in general.

      Now, trying to use this module on Win32 would just be silly (unless using cygwin Perl). It'd be using fork which Perl turns into a bad emulation of fork(2) using a bad implementation of threads and then the module uses some IPC (which Perl might also be emulating) on top for sharing variables. So emulating Perl's bad threads by using a badly emulated fork that still uses Perl's bad threads. Yeah, I'd expect that to be awful. So that's another possible explanation for your reaction.

      - tye        

        So you've used the referenced module, forks ?

        I'll come back to that at the end, but in the interim: Have you?

        Your use of the plural and my expectations both lead me to suspect that you misunderstood the node you replied to and didn't follow the offered link.

        My "use of the plural"?

        I pluralised:

        1. "They" & "threads"; both relating to the same thing.
        2. "advantages" & "disadvantages"; again two sides of the same coin.

        To put your mind at rest. I did follow the link provided and I was specifically suggesting against the use of the forks module for use in solving the OP's problem.

        I can understand your suspicion regarding my use of "They" rather than 'It' in reference to a single module, but the 's' on the end of the name 'forks' engenders one (me) to think of them (it) as multiple entities, "forks", hence 'they', rather than a single entity, 'the forks module'. A grammatical error? Perhaps, but if so; what's new :)

        Overall, I think that your suspicion is based more upon your "expectations" than it is upon my grammar.

        Now, trying to use this module on Win32 would just be silly (unless using cygwin Perl). It'd be using fork which Perl turns into a bad emulation of fork(2) using a bad implementation of threads and then the module uses some IPC (which Perl might also be emulating) on top for sharing variables. So emulating Perl's bad threads by using a badly emulated fork that still uses Perl's bad threads.

        First up. The OP clearly identified his platform as WinNT. There is no ambiguity there.

        • So, no native fork.
        • No COW memory.
        • fork emulated with threads, with everything that entails.
        • Shared memory emulated, despite the presence of real shared memory, by
          1. freezing data structures into Storable format,
          2. exchanging the serialised data between threads of the same process via network sockets (no unix sockets here!)
          3. de-serialising that data and expanding it back into Perl data structures.

          And remember, threads::shared memory is shared by all threads created after the the declaration of the shared variable. To emulate this process via forks, including emulated forks, means that the serialised data has to be transmitted between any 'thread' that modifies it, and every other 'thread' that has visibility of it.

        So, I don't think that my "None of the advantages, all of the disadvantages and more." was in any way an overstatement of the situation vis-a-vis, using forks to solve the OP's problem.

        With respect to cygwin. Whilst cygwin does provide a remarkably accurate emulation of the POSIX forking call, if you've ever looked into the cygwin source code to see how it achieves it, you'll know that is it far, far away from being the "quick and cheap" kernel-provided process that fork is on a *nix platforms.

        In a nutshell, it involves;

        1. Discovering the name of the executable being forked.
        2. Using CreateProcess() to start a new copy of that process, with it's initial thread suspended.
        3. The forking process then uses OpenProcess() to get a handle on the new suspended process.
        4. It then destroys the default stack; heap and data segments allocated to that process at startup.
        5. It the iterates it own process memory segments, VirtualAlloc()s new memory segments within the forked process, and copies it's own data/stack/heaps segments wholesale into the new process.
        6. It then has to scan through those newly copied data segments and 'fix up' various bits of memory.

          For example; there is the Perl global $$, which obviously has to be given a new value in the forked process. There are many other similar bits of memory that need to be modified in the new process.

        7. Finally, the new process is closed and the suspended state is lifted; and the pid of the forked process is returned to the calling (forking) process.

        All in all, it is a remarkable achievement by the cygwin developers to make it work. But it is a far cry from the "quick and cheap" mechanism it is emulating. And it does not benefit from COW either. Remember that in terms of the C code of the perl executable, Perl code is just data. It's not just Perl data segments that must be copied. Perl code lives in data segments, and cannot even benefit from the sharing of code segments that Win32 does as a matter of course, either.

        So, even under cygwin, forks is a very cumbersome process when compared to the same thing done on platforms with a native kernel fork.

        Do not take the above as a verbatim description of the cygwin fork emulation process, nor take me to task for the inevitable inaccuracies. The above was typed from memory of my explorations of the code some 3 or 4 years ago when I was working with liz attempting to make forks work on Win32. It was at that time I expended a considerable amount of time looking into the cygwin forking emulation, before reaching the conclusion that making forks work on Win32 was a waste of time and abandoning further effort in that direction.

        The only way that fork could ever be made efficient on Win32, is if MS chose to do it in the kernel. The only thing preventing this is the will to do it. The show stopper from the point of anyone outside of MS development teams, is the ability to create the internal data structures that represent a 'process' and link them into the kernel tables used by the scheduler. If these data structures were available, and it was possible to create a new, empty 'process', without having the image loaded from disk, then it would be possible to 'fill in' the code and data segments by cloning an existing process. And readonly segments could be COW shared from the existing process using the attributes available on the VirtualAlloc() call.

        So, to answer your first question: So you've used the referenced module, forks ?

        No.I have never run forks on a platform that provides a native fork. However, I did do enough investigations a few years ago to convince me that that module could never be an effective solution to mutil-tasking on Win32. Hence my post and (gentle) admonishment of the suggestion that it could be a solution for the OP's problem.

        Finally, I'd like to tempt you to reconsider this paragraph:

        On a platform with real fork support, I'd expect the technique outlined in forks to be more memory efficient and, at least in some respects, faster than Perl threads (and not terribly slower in any respect), and perhaps less buggy. Note that this has everything to do with how poorly Perl's particular implementation of threading is, not a problem with threading in general.

        I'd love to see you run a benchmark against a small, threaded Perl program (that I'll willingly provide), run on a platform that provides a native fork, that compares the performance of that program running using Perl's ithreads implementation with the same program, but using forks in it's headline billing role as a "drop in replacement for use threads;.

        If you're up for this, I'll knock up a small threaded app that uses threads and threads::shared to run half a dozen threads that manipulate a shared HoAs from concurrent threads. All you would need to do is run the program once on a *nix platform as is. And a second time having replaced use threads;use threads::shared; with use forks; use forks::shared; and publish the results.

        I would be very much interested to see the outcome. I'd even place a (very) small wager that threads would out-perform forks for performance, though you'd probably be right that the forks version would use less memory.

        BTW. I've attempted to vet this post for spelling and grammatical errors, but as is usual with me, I probably won't notice any residual mistakes until I come back a read it cold, after several hours. I hope that won't create any further misunderstandings.


        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://612013]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others romping around the Monastery: (6)
As of 2024-04-24 22:31 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found