Perl interpreter throws strange errors for event-driven callback processing under race conditions

Cotton4Lunch has asked for the wisdom of the Perl Monks concerning the following question:

Hi Perl Monks,

I am sort of new to Perl. Using SWIG and with some XS code I am trying to accomplish event-driven callback into Perl from my library (written in C/C++ for windows platform). Callback is fired from different thread context.

The implementation is when Perl registers for callback, Perl's subroutine reference is stored in a global SV pointer and instead a wrapper function is actually registered to the library API. When library fires the callback, the wrapper function calls into perl and puts function arguments into Perl stack.

This approach is working fine for most part. However, it is observed that 1/5 times this implementation runs into race condition and Perl throws strange errors which I don't understand.

1. Somewhere in between processing of callback wrapper function, Perl control seems to be lost (it never returns to main. Perl dies gracefully and no error is reported).

2. panic: pp_iter at F:\TEST\SWIG_API\Debug\run_cb.pl line 22. (Points to Win32::Sleep(int(rand(100))); in main thread).

3. Crash somewhere in perl interpreter (Stack trace shows only perl512.dll calls. Haven't tried debug build).

4. Some other "panic: " errors.

The details part is as below. Please help me understand why this might be happening.

use strict;
use warnings;
use MyLibSWIG;
use Win32;


my $cnt_cback = \&my_connect_cb;

for(1..100){
    print("\n\n ########## TEST ITERATE : $_ ########## \n");
    MyTest();
}

sub MyTest
{
    MyLibSWIG::MyRegister($cnt_cback);

    #Test: engage interpreter here while callback is being processed
    for (1..10){
        print(sprintf("[%d] PRL: Doing something %d\n",Win32::GetCurre
+ntThreadId(),$_));
        Win32::Sleep(int(rand(100)));
    }

    MyLibSWIG::MyDeregister();
}

sub my_connect_cb
{
    print(sprintf("[%d] PRL: my_connect_cb called bConn = %d\n",Win32:
+:GetCurrentThreadId(), $_[0]));
}
[download]

During the callback Registration I'm storing the Perl subroutine reference and Perl context in global pointers.

SV* MyConnectCbPerl = NULL;
void* pMyConnectCbPerlCTX = NULL;

extern void wrap_connect_cback_handler(BOOL bConnected);


XS(_wrap_MyRegister) {
  {
    PFN_CONNECT_CALLBACK arg1 = (PFN_CONNECT_CALLBACK) 0 ;
    int argvi = 0;
    DWORD result;
    dXSARGS;
    
    if ((items < 1) || (items > 1)) {
      SWIG_croak("Usage: MyRegister(pfnConnectCallback);");
    }
    {
      int status = IsValidCBRef(ST(0));
      if (status == 0)
      {
        MyConnectCbPerl       = (SV *)ST(0);                //Save reg
+istered sub refrence
        pMyConnectCbPerlCTX   = Perl_get_context();            //Save 
+Perl Context
        arg1 = wrap_connect_cback_handler;                //Register a
+ wrapper function. When fired, the wrapper function invokes the perl 
+subroutine.
      }
    }
    result = (DWORD)MyRegister(arg1);
    ST(argvi) = SWIG_From_unsigned_SS_long  SWIG_PERL_CALL_ARGS_1((uns
+igned long)(result)); argvi++ ;
    
    XSRETURN(argvi);
  fail:
    
    SWIG_croak_null();
  }
}
[download]

When Wrapper callback function is invoked, call perl subroutine :

void wrap_connect_cback_handler(BOOL bConnected)
{
    PERL_SET_CONTEXT(pMyConnectCbPerlCTX);
    
    SV * sv = NULL; 
    sv = MyConnectCbPerl;

    if (sv == (SV*)NULL) 
            croak("Internal error...MyConnectCbPerl not registered\n")
+;

    //Sleep(50);
    
    dSP;
    ENTER;
    SAVETMPS;
   
    PUSHMARK(SP);

    XPUSHs(sv_2mortal(newSViv(bConnected)));
        PUTBACK;
    
    /* Call the Perl sub */
    call_sv(sv, G_DISCARD);    
    //PERL_SET_CONTEXT(pMyConnectCbPerlCTX);
    
    SPAGAIN; 
    PUTBACK;

    FREETMPS;
    LEAVE;
}
[download]

When issue occurs output looks like this (note below: there was no race condition problem iteration run 1 to 5 )-

 ########## TEST ITERATE : 6 ##########
[5528] LIB: MyRegister pfnConnectCallback = 54E310A5
[5528] PRL: Doing something 1
[6116] LIB: CallbackWorker firing ConnectCallback(0)
[6116] PRL: my_connect_cb called bConn = 1
[5528] PRL: Doing something 2
[6116] LIB: CallbackWorker firing ConnectCallback(1)
[6116] PRL: my_connect_cb called bConn = 0
[6116] LIB: CallbackWorker firing ConnectCallback(2)
[6116] PRL: my_connect_cb called bConn = 1
[5528] PRL: Doing something 3
[6116] LIB: CallbackWorker firing ConnectCallback(3)
[6116] PRL: my_connect_cb called bConn = 0
[5528] PRL: Doing something 4
[6116] LIB: CallbackWorker firing ConnectCallback(4)
[6116] PRL: my_connect_cb called bConn = 0
[6116] PRL: my_connect_cb called bConn = 0
panic: pp_iter at F:\TEST\SWIG_API\Debug\run_cb.pl line 22.
panic: pp_iter at F:\TEST\SWIG_API\Debug\run_cb.pl line 22.
[download]

Should you want to take a look at the library code -

// MyLib.cpp : Defines the exported functions for the DLL application. // #include "stdafx.h" #include "MyLib.h" HANDLE hCallbackThreadHandle; PFN_CONNECT_CALLBACK g_pfnConnectCallback = NULL; static DWORD WINAPI CallbackWorker (LPVOID Context) { //Test: Fire callback N times. Event-Driven. for(int i = 0; i < 10; i++) { printf_s("%d LIB: CallbackWorker firing ConnectCallback(%d)\n", GetCurrentThreadId(), i); if (g_pfnConnectCallback) (*g_pfnConnectCallback)(rand()%2); Sleep(rand()%100); } return ERROR_SUCCESS; } DWORD WINAPI MyRegister (PFN_CONNECT_CALLBACK pfnConnectCallback) { printf_s("%d LIB: MyRegister pfnConnectCallback = %p\n", GetCurrentThreadId(), pfnConnectCallback); g_pfnConnectCallback = pfnConnectCallback; hCallbackThreadHandle = CreateThread (NULL, 0, CallbackWorker, 0, 0, NULL); if (hCallbackThreadHandle == NULL) printf_s("%d LIB: MyRegister Error Creating callback thread\n", GetCurrentThreadId()); return ERROR_SUCCESS; } void WINAPI MyDeregister () { printf_s("%d LIB: MyDeregister Waiting for callback thread to go down\n", GetCurrentThreadId()); DWORD rc = WaitForSingleObject(hCallbackThreadHandle, 100000); if (rc != WAIT_OBJECT_0) printf_s("%d LIB: ERROR worker thread goofed up\n", GetCurrentThreadId()); else printf_s("%d LIB: thread gone!\n", GetCurrentThreadId()); CloseHandle (hCallbackThreadHandle); hCallbackThreadHandle = NULL; g_pfnConnectCallback = NULL; }

Thanks.

Comment on Perl interpreter throws strange errors for event-driven callback processing under race conditions Select or Download Code

Replies are listed 'Best First'.
Re: Perl interpreter throws strange errors for event-driven callback processing under race conditions by Corion (Patriarch) on Jun 26, 2011 at 11:53 UTC
How does your callback get fired if your main program is in a tight loop calling `Win32::Sleep`? If it is called from a different thread, I can only suggest that you really do learn C programming and about (Windows) threads. Blindly sharing code and data structures between threads does not work, neither in C nor in Perl. It seems to me that the problem you're trying to solve is to fire a Perl callback that gets launched from a different thread. You cannot call into Perl if the interpreter is currently not in its runloop, as you have noticed. Async::Interrupt offers you the runloop modification and approach to send a signal to your Perl interpreter asynchronously. The signal will only get processed once the Perl interpreter enters its runloop again. This is quite similar to the "safe signals" of Perl, and there rarely is a reason to really use "unsafe signals" - in most cases, you'll end up with the same problems that you already saw.	[reply] [d/l]
Re: Perl interpreter throws strange errors for event-driven callback processing under race conditions by BrowserUk (Patriarch) on Jun 26, 2011 at 11:52 UTC
During the callback Registration I'm storing the Perl subroutine reference and Perl context in global pointers. If you have multiple threads, then you have multiple contexts. How can you store multiple contexts in a single global variable? Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice.	[reply]
Re: Perl interpreter throws strange errors for event-driven callback processing under race conditions by BrowserUk (Patriarch) on Jun 26, 2011 at 15:19 UTC
If you are really intent on doing this, and want to see how to do it properly, then take a good long hard look at Perl crash during perl_clone. It is a very long and technically involved thread with a lot of code and false starts before a successful combination of techniques is arrived at. You will need to read, follow and understand it all to make progress. Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice.	[reply]
Re^2: Perl interpreter throws strange errors for event-driven callback processing under race conditions by Cotton4Lunch (Initiate) on Jun 26, 2011 at 20:46 UTC
Thanks BrowserUk & Corion for your help. I created a very basic example here in order to demonstrate the problem. Just for clarity, there are only 2 threads in here. Main thread - calling MyRegister() and MyDeregister(). A callbackworker thread is spawned inside the library to fire callbacks. Random sleep is introduced in both threads to simulate scenario when both might wake up next to each other and run into race conditions somewhere inside the interpreter code. The global pointer is just as intended. I am not concerned about callback registration being called twice. BrowserUk, not sure if I understood you correctly, I checked for the value returned by Perl_get_context() it returns same value throughout the execution of the program. Therefore I'm in an understanding that there is only one perl interpreter here and only one context. The latest link you provided is close to the issue I'm trying to solve will take a good look and let you know. Corion, Async::Interrupt() as you suggested looks to be a good solution here. I believe the issue happens only when perl interpreter is forced to do 2 things from 2 threads at race conditions. for ex. "print" from main thread and context switches to "print" from callback subroutine. I suspect the call_pv() function in wrap_connect_cback_handler() function is NOT reentrant in nature (some others may not be as well.) Meanwhile, I tried Win32::Event to wait for callback in the main thread. The event is Set at the end of the callback subroutine. It worked fantastic. This approach has more overhead than Async::Interrupt() for an event driven scenario albeit simpler. Thanks.	[reply]
Re^3: Perl interpreter throws strange errors for event-driven callback processing under race conditions by BrowserUk (Patriarch) on Jun 26, 2011 at 22:04 UTC
Just for clarity, there are only 2 threads in here. Main thread - calling MyRegister() and MyDeregister(). A callbackworker thread is spawned inside the library to fire callbacks. Random sleep is introduced in both threads to simulate scenario when both might wake up next to each other and run into race conditions somewhere inside the interpreter code. Okay: Your main thread sets up and runs a perl interpreter instance in the normal way. From that interpreter you call into C passing a perl code reference taken within the auspices of that main thread. Within the C code, you spawn another OS thread running pure C code and pass on the coderef and an interpreter context from the main thread. The main thread then diddles around doing not very much, mostly sleeping and occasionally printing something to standard out. The C thread mostly sleeps and occasionally calls back into perl using the context from the main thread. Which means that every now and again both your main thread and your C thread are using the same perl interpreter context concurrently and errors occur. Well yes. They would wouldn't they. The whole purpose of iThreads (Interpreter threads) is to prevent that from happening, and you are circumventing that. Perl's internals were never designed for threading and are not reentrant because the interpreter relies upon a large block of static data for its operation. Just as many non-threaded C programs do. When iThreading was added, the mechanism chosen to work around the lack of reentrancy within the interpreter, was to give each thread its own copy of that big block of static data (otherwise known as a "context"), thereby allowing separate interpreters to run in each thread without stomping all over each others data. By allowing a C thread to (re)use a context from an existing thread concurrently to that thread, you are bypassing that protection mechanism and things will obviously break. Meanwhile, I tried Win32::Event to wait for callback in the main thread. The event is Set at the end of the callback subroutine. It worked fantastic. It may appear so in your essentially do-nothing demo, but I can assure you, it isn't in reality. What (I assume) you are doing is effectively serialising access to the shared context so that whilst the callback is using it, the main thread is doing nothing but blocking in a wait state, and vice versa. And whilst that may appear to work given the limited operations of the demo, it will not work once you start trying to do something useful with it. A simple example of how this will go wrong. One of the the many things stored in the context is the threads current working directory. Let's say your main thread was working its way through the files in the current directory when the callback fires on the C thread. Your semaphore stops it in its tracks and the callback runs. One of the things the callback does is change the current directory. Whilst the semaphore has prevented any immediate panics or other internal corruptions, when the callback ends and the main thread tries to pick up where it left off, it finds itself in a completely different directory to where it was, and everything goes tits up. Or, your main thread is iterating a hash using each; the callback fires and adds or deletes keys from that hash. When the main thread runs again things go tits up. Or, the main thread is incrementing an integer: `++$i`, it has read the value of the integer into a register and incremented it when it is interrupted by the callback firing. When the callback finishes and it gets control again, it attempts to write the incremented value back to the variable, but the callback undef'd it. Blam! The only way you will get away with calling back into the main thread from the C thread, is if the main thread does nothing at all (except sleep) once it has registered the callback. And that kind of defeats the purpose of having two threads. iThreads work surprisingly well given their nature, but you mess with their internals at your own risk. Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice.	[reply] [d/l]