bjdean has asked for the wisdom of the Perl Monks concerning the following question:

I have a new situation which appears to have arrived after an upgrade to debian buster.

I have some scripts in which system(...) is used to call other perl scripts, and those scripts are calling exit explicitly and so should be returning a zero exit code (once >>8 is applied to the value returned by system).

The expected behaviour almost always happens.

However in some cases the exit code returned by system is 1 - and I've been unable to either reliably reproduce the behaviour (it just happens sometimes in processes running on the server) nor stop it happening.

I know that the $CHILD_ERROR / $? perl variable can be used to change the exit code from a process - eg from the perl docs on perlvar:

END { $? = 1 if $? == 255; # die would make it 255 }

And I've looked high and low for any code (or modules used) which are doing this and have not found any.

So my question is this! Why would the perl have started doing this (I've not been able to find a known bug), and has anyone else started seeing system() >> 8 returning 1 unexpectedly?

Replies are listed 'Best First'.
Re: system >> 8 is non-zero when child exits with exit(0)
by haukex (Archbishop) on Mar 09, 2021 at 08:14 UTC
    once >>8 is applied to the value returned by system

    First, note you should be checking the whole return value of system, not just the upper byte. If the call was successful, its whole return value should be zero, so you might actually be missing error conditions. Once the return value of system is nonzero, then you can inspect it, as the docs show. You can also consider using IPC::System::Simple, which provides a drop-in replacement for system with nicer error handling.

    I've been unable to either reliably reproduce the behaviour (it just happens sometimes in processes running on the server) nor stop it happening. ... I've not been able to find a known bug

    IMHO, I would consider it likely then that it's actually the subprocesses that is in fact failing somehow. Adding detailed logging to the process doing the system would probably help in hunting down the problem. For that, consider also capturing the output of the process being run, especially its STDERR, using e.g. Capture::Tiny.

      Thanks for your thoughts!

      I have been checking the full value returned from system as well - I should have mentioned that.

      This is a new one on me - I've not seen it in 20 something years of working with perl. I agree that there's something happening in the child process (ie I'm quite sure the return value of system is correct), however the very last thing to happen in these failing processes is basically:

      logprint("Process completed successfully"); exit(0); # As part of testing we've even added exit with an explicit 0 + as here

      And yet the calling processing detects a non-zero return code - and (having been logging this return code in detail) it is always a return code of 1 (after >>8) or to quote some debugging output we've added:

      DEBUG TEST 1 uses the if/then/else described in perldoc -f system: DEBUG TEST 1 CASE(else): child exited with value 1 DEBUG TEST 2 uses the POSIX::W* checks of perlvar ${^CHILD_ERROR_NATIV +E} which is: 256 (binary: 0000000100000000) DEBUG TEST 2: POSIX::WIFEXITED returned true (child exited normally) DEBUG TEST 2: POSIX::WEXITSTATUS returned: 1 (binary: 0000000000000001 +)

        That's somewhat strange.

        the very last thing to happen in these failing processes is basically:

        Are you sure about that? I would guess that you're seeing your "Process completed successfully" message in your logs, but can logprint() fail in any other way? Have you tried capturing the process's STDOUT and STDERR to inspect that for any messages?

        Another thought: What does your system call look like? Is it a single string, as in system("/path/to/script.pl")? Are you using any shell features in the system call, e.g. system("/path/to/script.pl | grep ...")? Because in that case something could theoretically be going wrong with the shell (The problem of "the" default shell).

Re: system >> 8 is non-zero when child exits with exit(0)
by bjdean (Novice) on Jun 22, 2021 at 01:54 UTC
    ... if I find an answer I'll report back here.

    The answer was that in some cases the scripts were being run manually by people, and often this would be done by:

    1. running the scripts
    2. sending them into the background
    3. the people logging back out again at which point STDOUT and STDERR were closed.

    When the child process then tried to write to STDOUT/STDERR the non-zero exit code would result.

    I've not fully unravelled this thread to the root cause - but it is something that does not happen in older debian releases (though when the change happened I'm not sure).

    The solution is either to avoid writing to closed STDOUT/STDERRs (but that's fiddly/fragile across a lot of code) or better and simpler to redirect the output of processes that are going to be backgrounded using something like '>/dev/null 2>&1' or nohup which also redirects STDOUT/STDERR.

    If the output is needed for some time then probably the easiest approach is to launch the process in a terminal manager like screen/tmux for the duration of the run.