hennesse has asked for the wisdom of the Perl Monks concerning the following question:

Hi,

I recently ported an application from Unix to Win32, and I've got one problem left to solve - my diagnostic logfile is now unsorted.

Five different threads all share a common "textbook" logging subroutine...

use Fcntl qw(:flock SEEK_END); # --------------------------------------- # sub logmsg # --------------------------------------- sub logmsg { my $now = time; my $msec = int( ( $now - int($now) ) * 1000 ); my ( $sec, $min, $hour, $mday, $mon, $year, $wday, $yday, $isdst ) = + localtime(time); sub lock { flock( LOG, LOCK_EX ); # and, in case someone appended # while we were waiting... seek( LOG, 0, SEEK_END ); } sub unlock { flock( LOG, LOCK_UN ); } open LOG, ">> $logFilename"; lock(); printf LOG "%02d:%02d:%02d:%03d %s\r\n", $hour, $min, $sec, $msec, @ +_; unlock(); close(LOG); }

This worked fine on freeBSD. Even though there are multiple log messages in a single millisecond, the messages were always in perfect order. It made debugging a snap (the 5 processes do a lot of IPC). Now the log looks like this (check the timestamps):

16:57:23:238 progman: sent supervisor - progman,ctrlman,set,wort_pu +mp,off 16:57:23:240 supervisor: rcvd pidman - pidman,webman,status,hlt_du +ty,0 16:57:23:239 progman: sent supervisor - progman,ctrlman,set,boil_he +ater,off 16:57:23:240 progman: sent supervisor - progman,ctrlman,set,hlt_hea +ter,off 16:57:23:241 progman: sent supervisor - progman,ctrlman,set,hlt_ssr +,off 16:57:23:241 supervisor: sent ctrlman - pidman,webman,status,hlt_du +ty,0 16:57:23:234 ctrlman: rcvd supervisor - pidman,webman,status,hlt_du +ty,0 16:57:23:244 supervisor: sent progman - pidman,webman,status,hlt_du +ty,0 16:57:23:245 supervisor: sent webman - pidman,webman,status,hlt_du +ty,0 16:57:23:250 webman: rcvd supervisor - pidman,webman,status,hlt_du +ty,0 16:57:23:241 progman: sent supervisor - progman,ctrlman,set,boil_ss +r,off 16:57:23:246 progman: sent supervisor - progman,webman,status,hlt_d +uty,0 16:57:23:246 progman: sent supervisor - progman,webman,status,hlt_s +etpoint,0 16:57:23:246 supervisor: rcvd progman - progman,webman,status,step, +startup 16:57:23:249 supervisor: sent webman - progman,webman,status,step, +startup 16:57:23:253 webman: rcvd supervisor - progman,webman,status,step, +startup 16:57:23:250 supervisor: sent ctrlman - progman,webman,status,step, +startup 16:57:23:241 ctrlman: rcvd supervisor - progman,webman,status,step, +startup 16:57:23:247 progman: sent supervisor - progman,webman,status,hlt_m +ode,pid 16:57:23:251 progman: sent supervisor - progman,webman,status,hlt_p +ower,100 16:57:23:251 progman: sent supervisor - progman,webman,status,hlt_v +ary,1

Nothing is garbled, just out of order. The timestamp is created at the entry to the logging subroutine - before the open() and flock(). It appears that locking is working, BUT - when the lock is released, the thread (or process) that gets the lock next is not FIFO - it's random (phase of the moon?).

Unsorted log file is one thing, but Aye, here's the rub - each thread must wait a random amount of time to acquire the lock. Since this application runs hardware devices like pumps and heating elements, the concept of "random" is not in our vocabulary.

I suspect this is a problem with ActiveState 5.10.1 no-ssl binary and/or Windows, but to confuse matters, about the same time I moved from Unix to Win32, I also changed from fork-exec to threads, but I did not closely examine the logfile at the time.

Any ideas? - Thanks, Dave

Replies are listed 'Best First'.
Re: Win32 - flock() not FIFO?
by BrowserUk (Patriarch) on Oct 29, 2011 at 02:13 UTC

    Using file system based locking and re-opening the file for every message is ...urm... lets just say, non-optimal.

    The simplest, most portable way if you are using threads is to use a Thread::Queue.

    use threads; use Thread::Queue; ... my $Q = new Thread::Queue; sub logmsg { // added thread id to messages. $Q->enqueue( threads->tid . ": @_"; ); } async { open my $log, '>', 'logfile' or die $!; while( $Q->dequeue ) { my $now = time; my $msec = int( ( $now - int($now) ) * 1000 ); my ( $sec, $min, $hour ) = localtime(time); sprintf $log "%02d:%02d:%02d:%03d %s\r\n", $hour, $min, $sec, +$msec, $_; } close $log; } ... logmsg( ... ); ... $Q->enqueue( undef ); // to terminate logging thread.
    • There is no need for file locking as all the writing is performed by a single thread.

      The queue takes care of itself.

    • Loggin causes minimal delays in the critical code path.

      Queuing a message is very fast as all the formatting and IO is done in another thread.

    • The ordering of messages is preserved.

      Queues are ordered.

    Simple, effective and very efficient.


    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
      BrowserUK,

      Thread Queues is a perfect solution to the logfile problem.

      When you suggested using Thread Queues for my IPC before, I looked into them, but mistakenly thought they were like pipes – one thread to one other thread.

      Now that I see the error of my ways, I’m considering using them for my IPC (or is it ITC?), but there’s one problem.

      Three of my threads just sit around waiting for a datagram to arrive from another thread using select(,,,undef). The other two have to break out of the select every x seconds to do some housekeeping, so they use select(,,,timeout).

      Unfortunately, dequeue() doesn’t have a timeout feature. I don’t see any way to use dequeue_nb since during periods of inactivity, it’d have to spin in a tight loop.

      Any ideas?

      Thanks - Dave

        Three of my threads just sit around waiting for a datagram to arrive from another thread using select(,,,undef). The other two have to break out of the select every x seconds to do some housekeeping, so they use select(,,,timeout).

        Unfortunately, dequeue() doesn’t have a timeout feature. I don’t see any way to use dequeue_nb since during periods of inactivity, it’d have to spin in a tight loop....

        Any ideas?

        In general (and idealistically):Don't poll!

        The really nice thing about threading is that you don't have to try and make one piece of code do more than one thing at a time. If you have two things to do -- eg. 1) wait for a message from somewhere; 2) periodically do some housekeeping; -- use two threads.

        async { while( $Q->dequeue() ) { ## process message. } }; async { until( $dieNow ) { sleep $someTime; ## do housekeeping. } };

        In this way, you separate the concerns of the two. Each is free to get on with exactly what it needs to do at its earliest opportunity -- ie. its next time-slice --, and continue that single task to completion without ever concerning itself that it might be taking too long or the need to check what else needs to be done.

        That there in a nutshell is what makes kernel threading superior to event-driven or cooperative time-sharing mechanisms. Anyone who has ever done a job that required concentration, and that they answer a phone in a timely fashion, will get this point completely. It can be done, but each task complicates, interferes with and detracts from the performance of the other.

        </end-idealism>

        If you were designing your application from scratch, I would definitely suggest that you remove the periodic housekeeping duties from the two 'wait-for-datagram' threads. And move them, if possible combining them, into their own thread who purpose is purely periodic housekeeping. Given that you are currently restructuring your application to use threading, I also strongly suggest that you at least consider this possibility at this point.

        But, without having a good oversight of what your application is doing, or how it is currently structured and therefore what impact this type of re-structuring is likely to have, I have to realise that it may not be optimal of your time at this stage to do that.

        So, you currently have something like:

        while( my( $nfound, $timeleft) = select( ..., $timeout ) ) { if( $timeleft ) { ## Nothing doing, do some housekeeping } else { ## deal with the event } }

        Substitute:

        use Time::HiRes qw[ sleep ]; ... until( $dieNow ) { sleep( $timeout / 10 ); if( my $dgram = $Q->dequeue_nb() ) { ## Act on $dgram } else { ## Do housekeeping (if required). } }

        If your original timeout is 1 second, waking up every 1/10th of a second will have negligible affect on your CPU usage whilst ensuring that you process datagrams and housekeeping in a timely fashion. Indeed, you'd have to leave this:

        async{ 1 while Win32::Sleep( 100 ); }->join;;

        Running for a whole day to really notice that it had consumed any CPU at all.


        With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.
Re: Win32 - flock() not FIFO?
by Anonymous Monk on Oct 29, 2011 at 02:21 UTC

    that gets the lock next is not FIFO

    Why do you think it should be FIFO?

    I've never heard of a flock-ing order

    Any ideas?

    write a log viewer, one that doesn't care about the physical order (or disorder) of the entries (log process-ids and thread-ids not just timestamps)

    switch to a logging framework that takes care of all this for you