renodino has asked for the wisdom of the Perl Monks concerning the following question:

(WinXP SP2, AS5.8.6) I'm trying to build an strace for Perl scripts (as described in Perl coredump analysis tool ?) using memory mapped files, as its about the only reasonably similar shared memory mechanism between Win32 and *nix. I'm using Win32::MMF::Shareable for Win32, and Sys::Mmap for *nix platforms.

In order to keep most of the mmf access identical between Win32 and *nix, I'm using a single tied scalar string for Win32::MMF, and then applying substr() operations to read/write to it (as is used by Sys::Mmap).

I have successfully run a single threaded app and monitored it via a completely separate app.

Unfortunately, Win32::MMF::Shareable seems to be misbehaving with concurrent threaded access to the memory mapped file. I've reviewed Perl forking and shared memory strangeness under Windows, and applied its recommended fix (ie, re-require'ing MMF and retying in the child threads), but I'm seeing an apparent latency issue where the tie doesn't seem to pick up the right piece of the MMF if concurrent accesses are made close to each other, ie,

  1. The mmf is inited to all zeroes.
  2. Thread 1 writes an integer 1 at byte 100 of the mmf
  3. Thread 2 reads an integer at byte 500 of the mmf, expecting to get zero, but gets the value of 1 written by thread 1

The instances where this occurs seems somewhat random, and often injecting a few seconds wait time between operations improves the behavior.

And in the process of trying to write a small script to illustrate the issue, I've encountered another problem: the allocated size of the mmf isn't being properly setup in the threads (see below, where each thread prints out its notion of the length of the tied scalar), resulting in "substr outside of string" failures when there shouldn't be any.

Is there a known problem with Win32 memory mapped files, or is this an issue with Win32::MMF::Shareable ?

The code...

use threads; use threads::shared; use Win32::MMF::Shareable; use strict; use warnings; # # main thread creates the mmf # my $mmf; tie $mmf, 'Win32::MMF::Shareable', 'mmf', { namespace => 'Win32MMFTest', size => 10000, reuse => 0 }; # # and inits it # $mmf = 'A' x 10000; # # start each thread to run concurrent tests # my $locker : shared = 0; # # add a lock for the mmf - even tho !!!we shouldn't need it!!! # my $mmflock : shared = 0; my $thrd1 = threads->create(\&runtest, 0); my $thrd2 = threads->create(\&runtest, 5000); my @tids = ($thrd1->tid, $thrd2->tid); # # signal to run # { lock($locker); $locker = 1; cond_broadcast($locker); } # # and wait for completion # { lock($locker); cond_wait($locker) while ($locker < 3); } # # read back each thread's modifications # foreach my $i (0..19) { foreach (0..$#tids) { my ($first, $second, $third) = unpack('l d S/a*', substr($mmf, (5000 * $_) + ($i * 200), +200)); print "$first $second $third\n"; print STDERR "wires got crossed!!!\n" unless ($first == $tids[$_]); } } $thrd1->join(); $thrd2->join(); sub runtest { my $region = shift; # # wait for signal to run # my $tid = threads->self->tid; { lock($locker); cond_wait($locker) while ($locker < 1); } # # maybe we need to re-require for Win32::MMF::Shareable in a new # thread or process ? # (see http://www.perlmonks.com/?node_id=331029) # require Win32::MMF::Shareable; my $mmf; tie $mmf, 'Win32::MMF::Shareable', 'mmf', { namespace => 'Win32MMFTest', size => 10000, reuse => 1 }; print "length of mmf is ", length($mmf), "\n"; # # write some stuff to our region # foreach (0..15) { print "$tid at ", $region + ($_ * 200), "\n"; my $entry = "this is the $region region for tid $tid"; my $len = length($entry); # # why does this die ???? # eval { lock($mmflock); # realy shouldn't be needed! substr($mmf, $region + ($_ * 200), $len + 14) = pack('l d S a*', $tid, time(), $len, $entry); }; print "Failure in $tid at ", $region + ($_ * 200), "\n" and la +st if $@; } # # signal completion # { lock($locker); $locker++; cond_broadcast($locker); } return 1; }

Replies are listed 'Best First'.
Re: Win32::MMF + threads misbehavior
by BrowserUk (Patriarch) on Apr 05, 2006 at 22:35 UTC

    Making no attempt to speak for the author, my first reaction is why are you trying to use memory mapped files to share data between threads?

    All the threads of a process already have access to all the memory in that process. MMF is designed to allow processes to share memory, not threads.

    I'm not going to say outright that this cannot be made to work, but the idea of attempting to mix ithreads; perl's own very special brand of shared data (threads::shareable); and a Perl tied interface to an OS IPC (InterProcessCommunication) mechanism; all within a single process just looks like a recipe for disaster to me.

    In the normal model of things, MMF allows two processes to request that the OS map a single block of physical memory into the virtual address space of two separate processes--often at different virtual addresses. I've been trying to imagine what the OS is going to do if two threads ask the OS to map a single block of physical address space into the virtual address space of a single process twice?

    • Will it recognise that the second request is a duplicate and return a handle to the same virtual address mapping?
    • Or will it create a second mapping, effectively causing the same physical block of memory to appear in two places in the processes virtual memory map?
    • And given that (Perlish) shared memory is already tied, what is going to be the effect of creating ties in two threads (which involves Perlish data structures that are only useable within the ithreads in which they are created), to the "same" piece of memory?

    It's really hard to see quite what it is that you hope to achieve through this mechanism, and whatever it is, my instinct tells me you are on a hiding to nothing.

    If you describe your high level goal for this arrangement, maybe there is a better way of achieving it than abusing MMF this way?


    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
      The primary purpose (as noted in Perl coredump analysis tool ?), is to provide a strace-like capability for running Perl apps. Which means the external strace program (let's call it plstrace) needs to share something with the running script thats being traced. Note that plstrace is completely independent of the script to be traced, except for the ability to peek into the shared area to see what the script is doing at any given moment, and hence threads::shared is not an option for the shared area.

      Further, I'd like to be able to support both Win32 and *nix platforms. The most similar solution I can find for those is memory mapped files, via Win32::MMF and Sys::Mmap, respectively. So plstrace, and Devel:STrace map to the same file, w/ Devel::STrace acting a bit like Devel::Dprof, except simpler: just keeping track of the call stack, and updating things in the shared area as things change. I try to minimze the amount of accesses and locks to keep the overhead as minimal as I can (Unfortunately, Win32::MMF does a lot of extra stuff I'd rather it not do in that regard, but in the interest of GOWI, I'll live with it...if I can get Win32::MMF to work).

      Now the fun part: my primary need for this is a large multithreaded application which occasionally hangs in one of the threads (apparently caught in an infinite loop). Hence, Devel::STrace needs to dump traces for all the threads in a process. So, thru a series of clever parlor tricks, each thread gets its own region of the mmf to trace its call stack, from which DB::sub() adds and removes entries, and which DB::DB() updates with line numbers and timestamps. And plstrace attaches to the mmf and dumps its contents every so often. And then I eyeball the output when on of my threads goes 100% CPU, et voila I know which thread and where things are going awry.

      Note that I'm not doing anything w/ threads::shared and mmf here. I was *hoping* that all that cloning would properly pick up the tie of the mmf scalar I'm using, and I'd just use CLONE() to invalidate the current mmf region and grab a new one for the new thread. And everything just carries merrily on. (And, wonder of wonders it actually works on Linux - FC4 Perl 5.8.6 - ! Tho Sys::Mmap has its own set of bizarre behavior)

      But I don't want to stop there...the next step is multiprocess apps and multithreaded-multiprocess apps. One might question my sanity for pursuing multiprocess support, since the user can always separately attach plstrace to each process manually...but being able to see everything as a group seems useful to me, and (theoretically, at least) should work just as well as a single process, multithreaded solution.

      Thats why.

        Given (my) uncertainty about what happens when you mix MMF/threads/ties et al, I'd offer two alternative approaches:

        1. Have the per thread DB::DB() routines log the trace information to a common (queue) and start a separate thread that reads the queue and writes to the MMF.

          You still have the problem of arranging for different processes to write to different areas of that shared memory without collisions, along with synchronisation between processes.

          This way, you remove the in-process contention and the uncertainty of behaviour surrounding having multiple tied interfaces on separate ithreads attempting to juggle access to a single process global resource.

        2. Write the external Strace program as a (threaded) tcp server application and have the DB::DB() routines log directly to it via sockets.

          Each thread can create it's own connection to the external program which avoids adding complexity to the process you are trying to debug. You dodge all the problems associated with synchronisation and conflicts that arise by trying to share global resources between threads through tied interface. It would probably be a lot faster to boot.

          I'd use a queue in the server to coalesce the inputs from the clients into a coherent, ordered whole for saving or presentation.

        I'd go for the latter approach, as I think that debug tools should impose as little complexity and overhead as possible upon the programs thet are debugging, and to my mind, opening and writing to a socket fits that bill quite well.

        Trying to manage allocations of memory and synchronise access to them from multiple threads in multiple (unknown) processes; without creating deadlocks; and without your sync'ing and locking interfering with their own sync'ing and locking--given that you don't know what they might be doing, and indeed you are likely to be trying to help them debug it--just seems like too big a hill to climb.

        Synchronisation of access to memory is the Achilles Heel of threads, and the best way of dealing with it is to avoid doing it whenever possible.

        Beyond that, all I can do is wish you the very best of luck :)


        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.
Re: Win32::MMF + threads misbehavior
by BrowserUk (Patriarch) on Apr 06, 2006 at 06:18 UTC

    I just knocked up a crude trace monitor based around a threaded, tcp server receiving trace information from DB::DB based, per thread clients.

    Here's a sample of trace from 2 copies of a multi-threaded test app I had kicking around that are each running 100 threads:

    # hires timestamp pid tid script lineno package subro +utine args 1144302961.37707 3168( 38) threadtest.pl( 19) Win32::Console::Write +Char: n/a 1144302961.29688 3216( 41) threadtest.pl( 29) (eval): n/a 1144302961.29714 3216( 41) threadtest.pl( 29) (eval): n/a 1144302961.29730 3216( 41) threadtest.pl( 29) (eval): n/a 1144302961.29743 3216( 41) threadtest.pl( 29) (eval): n/a 1144302961.29756 3216( 41) threadtest.pl( 29) (eval): n/a 1144302961.29769 3216( 41) threadtest.pl( 16) Win32::Console::Write +Char: n/a 1144302961.29782 3216( 41) threadtest.pl( 16) Win32::Console::Write +Char: n/a 1144302961.29795 3216( 41) threadtest.pl( 16) Win32::Console::Write +Char: n/a 1144302961.29818 3216( 41) threadtest.pl( 29) (eval): n/a 1144302961.35938 3216( 0) ( ) : n/a 1144302961.35980 3216( 0) ( ) : n/a 1144302961.40625 3216( 42) threadtest.pl( 29) (eval): n/a 1144302961.40652 3216( 42) threadtest.pl( 29) (eval): n/a 1144302961.40668 3216( 42) threadtest.pl( 29) (eval): n/a 1144302961.40681 3216( 42) threadtest.pl( 29) (eval): n/a 1144302961.40694 3216( 42) threadtest.pl( 29) (eval): n/a 1144302961.40706 3216( 42) threadtest.pl( 16) Win32::Console::Write +Char: n/a 1144302961.40719 3216( 42) threadtest.pl( 16) Win32::Console::Write +Char: n/a 1144302961.40732 3216( 42) threadtest.pl( 16) Win32::Console::Write +Char: n/a 1144302961.40754 3216( 42) threadtest.pl( 29) (eval): n/a 1144302961.49839 3168( 38) threadtest.pl( 19) Win32::Console::Write +Char: n/a 1144302961.49881 3168( 38) threadtest.pl( 19) Win32::Console::Write +Char: n/a 1144302961.49897 3168( 38) threadtest.pl( 19) Win32::Console::Write +Char: n/a 1144302961.50139 3168( 28) threadtest.pl( 19) Win32::Console::Write +Char: n/a 1144302961.50156 3168( 28) threadtest.pl( 19) Win32::Console::Write +Char: n/a 1144302961.50169 3168( 28) threadtest.pl( 19) Win32::Console::Write +Char: n/a 1144302961.50000 3168( 31) threadtest.pl( 29) (eval): n/a 1144302961.50016 3168( 31) threadtest.pl( 29) (eval): n/a 1144302961.50028 3168( 31) threadtest.pl( 29) (eval): n/a 1144302961.50041 3168( 31) threadtest.pl( 19) Win32::Console::Write +Char: n/a 1144302961.50054 3168( 31) threadtest.pl( 19) Win32::Console::Write +Char: n/a 1144302961.50066 3168( 31) threadtest.pl( 19) Win32::Console::Write +Char: n/a 1144302961.50000 3168( 7) threadtest.pl( 29) (eval): n/a 1144302961.50016 3168( 7) threadtest.pl( 29) (eval): n/a 1144302961.50028 3168( 7) threadtest.pl( 29) (eval): n/a 1144302961.50041 3168( 7) threadtest.pl( 19) Win32::Console::Write +Char: n/a 1144302961.50054 3168( 7) threadtest.pl( 19) Win32::Console::Write +Char: n/a 1144302961.50066 3168( 7) threadtest.pl( 19) Win32::Console::Write +Char: n/a 1144302961.50000 3168( 15) threadtest.pl( 29) (eval): n/a 1144302961.50016 3168( 15) threadtest.pl( 29) (eval): n/a 1144302961.50028 3168( 15) threadtest.pl( 29) (eval): n/a 1144302961.50041 3168( 15) threadtest.pl( 19) Win32::Console::Write +Char: n/a 1144302961.50053 3168( 15) threadtest.pl( 19) Win32::Console::Write +Char: n/a 1144302961.50065 3168( 15) threadtest.pl( 19) Win32::Console::Write +Char: n/a 1144302961.48975 3168( 6) threadtest.pl( 19) Win32::Console::Write +Char: n/a 1144302961.48991 3168( 6) threadtest.pl( 19) Win32::Console::Write +Char: n/a 1144302961.49124 3168( 14) threadtest.pl( 19) Win32::Console::Write +Char: n/a

    The 'n/a' is meant to be the subroutine args, but I haven't worked out how to obtain those yet.

    You invoke the clients in the usual debugger fashion:

    tperl -d:Ttrace threadtest.pl

    This is the debug client module(crude):

    I ran the server, monitor.pl in another console session and just dumped the output from the cetral queue to the screen to produce the above output. You cold modify the Worker thread to

    Smart::Comments::Lite is my doctored version of the theDamian's CPAN tool. Comment out the use line and it will disable it completely.

    This was hacked together from bits of existing code in about one hour. It imposes very little load on the programs being traced and produced sequenced information that ought to make working out where your application is disappearing up it own navel fairly easy. Redirecting the output to a file, (via tee might be good), would allow you to get a permanent record of the sequence and timing of events that lead up to the problem.

    HTH.


    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
      Not certain how you conclude "imposes very little overhead", but a socket write (local or not) + a printf for every Perl statement looks like a lot of overhead to me. Esp. if the problem thread is in a tight loop (altho the socket write will block eventually, so I guess that mitigates that issue)

      While I appreciate your efforts, I'm already concerned about the overhead of pack()ing an integer and a float prior to writing thru the tie() for every statement, so socket writes are not an acceptable solution for my purposes.

        I'm already concerned about the overhead of pack()ing an integer and a float prior to writing thru the tie() for every statement, so socket writes are not an acceptable solution for my purposes.

        I hate to tell you this, but tieing isn't quick. In the case of MMF, around 2 to 3 times slower than writing to a socket.

        #! perl -slw use strict; use Win32::MMF::Shareable; use IO::Socket::INET; use Benchmark qw[ cmpthese ]; our $DSIZE ||= 100; my $mmf; tie $mmf, 'Win32::MMF::Shareable', 'mmf', { namespace => 'Win32MMFTest', size => 10000, reuse => 0 }; my $sock = IO::Socket::INET->new( 'localhost:54321' ); my $data = 'X' x $DSIZE; cmpthese -1, { MMF => sub { $mmf = $data }, TCP => sub { print $sock $data } }; __END__ C:\test>MMF-IO-b -DSIZE=8 Rate MMF TCP MMF 28872/s -- -77% TCP 123636/s 328% -- C:\test>MMF-IO-b -DSIZE=80 Rate MMF TCP MMF 27401/s -- -73% TCP 102909/s 276% -- C:\test>MMF-IO-b -DSIZE=800 Rate MMF TCP MMF 28295/s -- -67% TCP 86245/s 205% --

        And that could probably be speeded up by playing with the buffer sizes and upping the priority of the read threads.

        Not certain how you conclude "imposes very little overhead", but a socket write (local or not) + a printf for every Perl statement looks like a lot of overhead to me. Esp. if the problem thread is in a tight loop (altho the socket write will block eventually, so I guess that mitigates that issue)

        Not sure why you think it would block? The other ends of each of those sockets are being service by a dedicated thread that reads a line and posts it to a queue. With 2 processes running 100 threads, there appears to be negligable slowdown on the app under test and each of those 100 200 threads is in a tight loop incrementing a variable and outputting it to the console.

        But, I can see I am wasting your time with a NIH solution you do not want, so I'll stop.


        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.