in reply to Re^3: system >> 8 is non-zero when child exits with exit(0)
in thread system >> 8 is non-zero when child exits with exit(0)

the very last thing to happen in these failing processes is basically:
Are you sure about that? I would guess that you're seeing your "Process completed successfully" message in your logs, but can logprint() fail in any other way? Have you tried capturing the process's STDOUT and STDERR to inspect that for any messages?

Am I sure? 99.9% yes - this particular logging function is very simple and on the local filesystem where there are no other issues (and plenty of other code succeeding doing this exact same thing most of the time).

You make an interesting point about the output! While we monitor STDOUT/STDERR in some cases we also throw it away in others so I've added some extra monitoring there.

Another thought: What does your system call look like? Is it a single string, as in system("/path/to/script.pl")?

Yep - in this case it's a string passed to the shell, so the shell problem does come into play so I'll have a look at that angle in a little more detail.

What does seem to be consistent is that where a single script is making multiple system calls, when the problem happens it keeps happening to that script (so some sort of state somewhere is going squiffy). But you I log into the same server at the same time, run the same code and the problem is not reproducible (the best kind of bug, if you're a bug!).

  • Comment on Re^4: system >> 8 is non-zero when child exits with exit(0)

Replies are listed 'Best First'.
Re^5:system >> 8 is non-zero when child exits with exit(0)
by haukex (Archbishop) on Mar 11, 2021 at 19:26 UTC
    Yep - in this case it's a string passed to the shell, so the shell problem does come into play so I'll have a look at that angle in a little more detail.

    Unfortunately I'm running out of ideas. I think it would be useful if you could show us a bit more. Though I understand that if the script is complex enough, a Short, Self-Contained, Correct Example is probably a fair amount of work, but you haven't even shown a rough sketch of what your system call looks like - if it's just a single executable, if there are arguments, variables interpolated into the string, or if there are shell metacharacters can all make a difference. Note that you can force system to avoid the shell and use execvp directly (on *NIX) using the form system {'/path/to/script.pl'} '/path/to/script.pl';. Also, you mention system calls that your subprocess is doing, what kind of system calls are they?

      Thanks again for your thoughts and between the code being something I can't share here, and it being relatively complex the Short, Self-Contained, Correct Example is tricky as you suggested.

      At this stage I've increased monitoring and am trying to create an environment where the problem can be reliable reproduced - if I find an answer I'll report back here.

Re^5: system >> 8 is non-zero when child exits with exit(0)
by jcb (Parson) on Mar 11, 2021 at 02:38 UTC

    Failure occurs in autonomous scripts but not when the same script is run in an interactive session? Does this system have SELinux? If so, check the SELinux avc denial logs — the script may be running in a different security context...

      Thanks for your thoughts, but no SELinux on this system. :)