in reply to Re: waitpid returns -1 for still running child (Windows)
in thread waitpid returns -1 for still running child (Windows)

I cannot reproduce your results ...
Neither do I. That's the problem. I would be much happier if I could reproduce it, but in general, it runs well. I have found one case so far, where it failed, but this case is documented well. It seems to be a rare case, so I hoped someone with good Windows knowledge would know of some exotic circumstance where this problem might show up on Windows.

-- 
Ronald Fischer <ynnor@mm.st>

Replies are listed 'Best First'.
Re^3: waitpid returns -1 for still running child (Windows)
by BrowserUk (Patriarch) on Jul 06, 2010 at 13:17 UTC
    I hoped someone with good Windows knowledge would know of some exotic circumstance where this problem might show up on Windows.

    How is the logging done? Ie. Are they simple prints to the respective log files directly from the applications, or are you using some centralised logging mechanism via pipes or sockets?

    How & where are the timestamps added?

    As for tracking it down more thoroughly, I'd start off by creating a simple UDP logging deamon and have both (all) apps send their messages to that. I'd time stamp them at both ends.


    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
      Logging is done just by print statements, and the timestamps are calculated at the time the print statement is executed. On the master process, the log entry is printed after waitpid has been received, and on the child process, the log entry is printed after the control file is written, so even if there would be some delay, it would aggravate the problem instead of explain it :-(

      -- 
      Ronald Fischer <ynnor@mm.st>

        My vague notion was that if you were using a centralised logging mechanism that added time stamps upon reciept, there might be some buffering going on.

        As described, it is hard to conceive of any circumstance that could result in the child perl being able to write after the parent cmd had completed. Nor even of what might cause the parent cmd to end before the child perl completed.

        Hence, looking for some reason why the recorded time stamps might be incorrect.

        A very (very) remote possibility, if you have NTP synchronisation set up, is that the time was synchronised between the child writing the file and printing its log message; and the grandparent detecting the completion of the parent and writing its log file; and the system's clock had been running 8 seconds fast prior to syncing. Unlikely, but if the time frame fits with the scheduled sync time on the machine...

        Beyond that, if the problem occurs sufficiently frequently to make it worth your while fixing the problem, rather than adding some workaround like sleeping for 10 seconds if the file isn't immediately available, then I'd look to setting up the Performance Monitoring tool to track processes and IO and see if that sheds any light once the problem reoccurs. Be very selective in what you choose to monitor, those Performance logs can get very large, very quickly.


        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.