in reply to Re: system >> 8 is non-zero when child exits with exit(0)
in thread system >> 8 is non-zero when child exits with exit(0)

Thanks for your thoughts!

I have been checking the full value returned from system as well - I should have mentioned that.

This is a new one on me - I've not seen it in 20 something years of working with perl. I agree that there's something happening in the child process (ie I'm quite sure the return value of system is correct), however the very last thing to happen in these failing processes is basically:

logprint("Process completed successfully"); exit(0); # As part of testing we've even added exit with an explicit 0 + as here

And yet the calling processing detects a non-zero return code - and (having been logging this return code in detail) it is always a return code of 1 (after >>8) or to quote some debugging output we've added:

DEBUG TEST 1 uses the if/then/else described in perldoc -f system: DEBUG TEST 1 CASE(else): child exited with value 1 DEBUG TEST 2 uses the POSIX::W* checks of perlvar ${^CHILD_ERROR_NATIV +E} which is: 256 (binary: 0000000100000000) DEBUG TEST 2: POSIX::WIFEXITED returned true (child exited normally) DEBUG TEST 2: POSIX::WEXITSTATUS returned: 1 (binary: 0000000000000001 +)

Replies are listed 'Best First'.
Re^3: system >> 8 is non-zero when child exits with exit(0)
by haukex (Archbishop) on Mar 09, 2021 at 15:09 UTC

    That's somewhat strange.

    the very last thing to happen in these failing processes is basically:

    Are you sure about that? I would guess that you're seeing your "Process completed successfully" message in your logs, but can logprint() fail in any other way? Have you tried capturing the process's STDOUT and STDERR to inspect that for any messages?

    Another thought: What does your system call look like? Is it a single string, as in system("/path/to/script.pl")? Are you using any shell features in the system call, e.g. system("/path/to/script.pl | grep ...")? Because in that case something could theoretically be going wrong with the shell (The problem of "the" default shell).

      the very last thing to happen in these failing processes is basically:
      Are you sure about that? I would guess that you're seeing your "Process completed successfully" message in your logs, but can logprint() fail in any other way? Have you tried capturing the process's STDOUT and STDERR to inspect that for any messages?

      Am I sure? 99.9% yes - this particular logging function is very simple and on the local filesystem where there are no other issues (and plenty of other code succeeding doing this exact same thing most of the time).

      You make an interesting point about the output! While we monitor STDOUT/STDERR in some cases we also throw it away in others so I've added some extra monitoring there.

      Another thought: What does your system call look like? Is it a single string, as in system("/path/to/script.pl")?

      Yep - in this case it's a string passed to the shell, so the shell problem does come into play so I'll have a look at that angle in a little more detail.

      What does seem to be consistent is that where a single script is making multiple system calls, when the problem happens it keeps happening to that script (so some sort of state somewhere is going squiffy). But you I log into the same server at the same time, run the same code and the problem is not reproducible (the best kind of bug, if you're a bug!).

        Yep - in this case it's a string passed to the shell, so the shell problem does come into play so I'll have a look at that angle in a little more detail.

        Unfortunately I'm running out of ideas. I think it would be useful if you could show us a bit more. Though I understand that if the script is complex enough, a Short, Self-Contained, Correct Example is probably a fair amount of work, but you haven't even shown a rough sketch of what your system call looks like - if it's just a single executable, if there are arguments, variables interpolated into the string, or if there are shell metacharacters can all make a difference. Note that you can force system to avoid the shell and use execvp directly (on *NIX) using the form system {'/path/to/script.pl'} '/path/to/script.pl';. Also, you mention system calls that your subprocess is doing, what kind of system calls are they?

        Failure occurs in autonomous scripts but not when the same script is run in an interactive session? Does this system have SELinux? If so, check the SELinux avc denial logs — the script may be running in a different security context...