in reply to Win32::MMF + threads misbehavior

Making no attempt to speak for the author, my first reaction is why are you trying to use memory mapped files to share data between threads?

All the threads of a process already have access to all the memory in that process. MMF is designed to allow processes to share memory, not threads.

I'm not going to say outright that this cannot be made to work, but the idea of attempting to mix ithreads; perl's own very special brand of shared data (threads::shareable); and a Perl tied interface to an OS IPC (InterProcessCommunication) mechanism; all within a single process just looks like a recipe for disaster to me.

In the normal model of things, MMF allows two processes to request that the OS map a single block of physical memory into the virtual address space of two separate processes--often at different virtual addresses. I've been trying to imagine what the OS is going to do if two threads ask the OS to map a single block of physical address space into the virtual address space of a single process twice?

It's really hard to see quite what it is that you hope to achieve through this mechanism, and whatever it is, my instinct tells me you are on a hiding to nothing.

If you describe your high level goal for this arrangement, maybe there is a better way of achieving it than abusing MMF this way?


Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.

Replies are listed 'Best First'.
Re^2: Win32::MMF + threads misbehavior
by renodino (Curate) on Apr 05, 2006 at 23:45 UTC
    The primary purpose (as noted in Perl coredump analysis tool ?), is to provide a strace-like capability for running Perl apps. Which means the external strace program (let's call it plstrace) needs to share something with the running script thats being traced. Note that plstrace is completely independent of the script to be traced, except for the ability to peek into the shared area to see what the script is doing at any given moment, and hence threads::shared is not an option for the shared area.

    Further, I'd like to be able to support both Win32 and *nix platforms. The most similar solution I can find for those is memory mapped files, via Win32::MMF and Sys::Mmap, respectively. So plstrace, and Devel:STrace map to the same file, w/ Devel::STrace acting a bit like Devel::Dprof, except simpler: just keeping track of the call stack, and updating things in the shared area as things change. I try to minimze the amount of accesses and locks to keep the overhead as minimal as I can (Unfortunately, Win32::MMF does a lot of extra stuff I'd rather it not do in that regard, but in the interest of GOWI, I'll live with it...if I can get Win32::MMF to work).

    Now the fun part: my primary need for this is a large multithreaded application which occasionally hangs in one of the threads (apparently caught in an infinite loop). Hence, Devel::STrace needs to dump traces for all the threads in a process. So, thru a series of clever parlor tricks, each thread gets its own region of the mmf to trace its call stack, from which DB::sub() adds and removes entries, and which DB::DB() updates with line numbers and timestamps. And plstrace attaches to the mmf and dumps its contents every so often. And then I eyeball the output when on of my threads goes 100% CPU, et voila I know which thread and where things are going awry.

    Note that I'm not doing anything w/ threads::shared and mmf here. I was *hoping* that all that cloning would properly pick up the tie of the mmf scalar I'm using, and I'd just use CLONE() to invalidate the current mmf region and grab a new one for the new thread. And everything just carries merrily on. (And, wonder of wonders it actually works on Linux - FC4 Perl 5.8.6 - ! Tho Sys::Mmap has its own set of bizarre behavior)

    But I don't want to stop there...the next step is multiprocess apps and multithreaded-multiprocess apps. One might question my sanity for pursuing multiprocess support, since the user can always separately attach plstrace to each process manually...but being able to see everything as a group seems useful to me, and (theoretically, at least) should work just as well as a single process, multithreaded solution.

    Thats why.

      Given (my) uncertainty about what happens when you mix MMF/threads/ties et al, I'd offer two alternative approaches:

      1. Have the per thread DB::DB() routines log the trace information to a common (queue) and start a separate thread that reads the queue and writes to the MMF.

        You still have the problem of arranging for different processes to write to different areas of that shared memory without collisions, along with synchronisation between processes.

        This way, you remove the in-process contention and the uncertainty of behaviour surrounding having multiple tied interfaces on separate ithreads attempting to juggle access to a single process global resource.

      2. Write the external Strace program as a (threaded) tcp server application and have the DB::DB() routines log directly to it via sockets.

        Each thread can create it's own connection to the external program which avoids adding complexity to the process you are trying to debug. You dodge all the problems associated with synchronisation and conflicts that arise by trying to share global resources between threads through tied interface. It would probably be a lot faster to boot.

        I'd use a queue in the server to coalesce the inputs from the clients into a coherent, ordered whole for saving or presentation.

      I'd go for the latter approach, as I think that debug tools should impose as little complexity and overhead as possible upon the programs thet are debugging, and to my mind, opening and writing to a socket fits that bill quite well.

      Trying to manage allocations of memory and synchronise access to them from multiple threads in multiple (unknown) processes; without creating deadlocks; and without your sync'ing and locking interfering with their own sync'ing and locking--given that you don't know what they might be doing, and indeed you are likely to be trying to help them debug it--just seems like too big a hill to climb.

      Synchronisation of access to memory is the Achilles Heel of threads, and the best way of dealing with it is to avoid doing it whenever possible.

      Beyond that, all I can do is wish you the very best of luck :)


      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.
        I guess we'll have to agree to disagree, as my experience of sockets (or even pipes) vs. shared memory (including of the mmap'ed kind) is very much different than yours (ie, the latter is *very* much simpler and faster, once its setup).

        As to locking, as I mentioned earlier, I do as little of it as possible: each thread gets its own region for a ring buffer, so the only synchronization required is at thread create time in order to allocate a region from the region map created by the root process/thread the first time it gets into DB::DB. Once thats done, no thread (or process) ever gets in the way of another. A simple thread lock (plus a file lock on *nix systems) covers the allocation step (tho admittedly not well on Win32, but fork() on Win32 is an odd duck anyway which I'm happy to ignore for the present).

        The plstrace app doesn't do any locking - it just reads as it needs, checks that the data looks reasonable, and prints it. This isn't a transaction mgr, just a tool to peek at whats going on inside a running app; a little garbage in the stew won't hurt anything.

        The most troubling issue is that this is behaving oddly on Win32, an OS that primarily relies on threads (rather than processes) for concurrency, and on memory mapped files for shared memory. So I'd expect things at the system call level to behave a bit more sensibly. Which leads me to believe that the cloning of the mmap'ed tie() may well be doing the damage.