morgon has asked for the wisdom of the Perl Monks concerning the following question:

Hi

I have inherited some code that runs on Windows as a service where one thread connects to another server first by creating a normal TCP/IP socket that is then upgraded to an SSL-socket (via IO::Socket::SSL::start_SSL).

I now have the problem that when there is high CPU-load on the machine the call to start_SSL never returns, it just blocks the thread forever (at least it does not return in a reasonable amount of time).

The code uses ActiveState 5.8.8 and gets compiled via perl2exe.

I don't have a lot of Windows experience but need to find a fix fast, so I would appreciate any ideas on how to attack this.

Many thanks!

Replies are listed 'Best First'.
Re: SSL-socket on Windows
by Marshall (Canon) on Mar 16, 2012 at 11:47 UTC
    This has the hallmarks of the start of a "merry trip through hell". I looked at this: IO-Socket Bugs
    IO::Socket::SSL is not threadsafe. This is because IO::Socket::SSL is based on Net::SSLeay which uses a global object to access some of the API of openssl and is therefore not threadsafe. It might probably work if you don't use SSL_verify_callback and SSL_password_cb.

    Non-blocking and timeouts (which are based on non-blocking) are not supported on Win32, because the underlying IO::Socket::INET does not support non-blocking on this platform.

    I would interpret this as meaning that there is gonna be trouble on a Win32 multi-threaded deamon (a "service" in Windows lingo) - i.e. the solution is not likely to be either "fast" or "easy".

    I would start by trying to replicate the problem on a test box with some "CPU eater" processes to simulate server load. I doubt that the fact that this is made as a .exe file matters - this file is unpacked when the service starts running. I would guess that to make this work really well, you would have to implement some thread coordination into this non-thread safe module? But a kludge may work...

    The older version of Perl may be a factor, but the current BUG list seems to indicate that this is not a "magic bullet". If a newer Perl version does help, the fact that this service is deployed as an .exe may actually help as this service can use a different Perl version than the system itself (no interaction - the newer version of Perl gets put into the .exe).

    Contrary to some opinions to the contrary, SIGALRM does work on Windows. Although there are quirks (sleep() for example is implemented in terms of SIGALRM - however deamons don't normally "sleep"). Perl at least >=5.7.3 uses what are called "safe" or "deferred" signals by default which prevents their delivery during certain OS functions.

    To implement your own "timeout", you probably have to override this and use normal "unsafe" signals - completely trash the "timed-out" child thread (because it is not "safe" to continue) and start over again. This would be "I'm stuck", blow up, completely restart this thread - don't know if that kind of a "patch" would work or not or how hard it would be for you to implement. Look at "safe signals" in Perlipc. If "trash the thread" and "start over" is an option, this might be a good "patch".

    I am not experienced at threads. In a fork based server, I would have the child exit(99) after an "un-recoverable" timeout (a "non-safe" SIGALRM). Have the parent get its normal SIGCHLD, and its signal handler would then check your exit status and if "99" (as opposed to "0" or whatever), I'd restart your butt! (Meaning fork another child process with the same "mission" as you had before). Other wiser Monks than me will know how to do this with threads. This kind of a kludge may work "well enough"?

    Update: See a recent post by me at, re: alarm hander at Re: Race condition with Mail::Sender::MailMsg? for the "general formula" to alarm a function. You have complicated situation so read the "yeah, but", links.

      the start of a "merry trip through hell".
      Thanks for the encouragement :-)

      I did some more testing and found that the start_SSL actually does seem to return eventually, even though it can take half an hour...

      The problem I face at the moment is the following:

      One thread connects the socket (non-SSL), then the socket gets passed to a "main" thread that tries to upgrade the socket to SSL and also handles several other sockets with a select-loop but because the start_SLL takes so long yet another "monitor"-thread decides that the main thread is dead and restarts the whole thing after which the cycle repeats.

      I have no idea why it was implemented like that (one thread making the TCP-connection and another upgrading it to SSL), but I hope I can make the thread that connects also do the SSL-upgrade and then pass the SSL-socket to the main-thread to avoid blocking it for so long.

      I hope (and pray) that as at any given time there is only one thread using the SSL-socket that the thread-unsafeness of IO::Socket::SSL will not be an issue for me - or would you say that you cannot pass a SSL-socket from one thread to another and my plan is doomed?

        Get a copy of Visual Studio. Download the symbols.zip package ActiveState offers. Attach the debugger to the perl process, and see what the C level callstack looks like. Non-blocking socket, not file/disk/serial, I/O works on windows. select() does work, again only on sockets. Ithreads is a *****. You might be "cloning" the socket or getting a NULL/undef socket when you pass it between ithreads. Just a random guess.
        Since I haven't actually done this myself, I can't say for sure that this is going to work, but you can pass a normal socket, so I would suppose that is ok for an SSL socket. This is indeed a strange design. But you plan sounds like its worth a shot.