RE: RE: Re: Process Reliablity

That problem already exists. If the script is dying a lot, it doesn't matter what respawn mechanism is used. There will still be a significant load placed on the system.

The problem description, though, makes it sound like it only happens occassionally, which has made it hard to debug. In that case, I would rather let a proven and well-known mechanism like initd do the monitoring for me than having to debug both the respawn code and the code that is dying in the first place.

Second, initd is usually pretty smart and stops trying a job if it is respawning too quickly. So your load increase for a bit, but initd does the Right Thing and stops it from becoming a fork bomb.

mikfire

Comment on RE: RE: Re: Process Reliablity

Replies are listed 'Best First'.
Multiple-Re: Process Reliablity by atl (Pilgrim) on Jul 20, 2000 at 18:31 UTC
Yes, you are right about the init stopping if the process respawns too quickly. I forgot that, it's true. You can stop the script manually instaed of waiting for init to decide, but that's no big deal. More important is that a well chosen sleep time between restarts will keep your system responsive anyway. That would also be a crude workaround if the cause for the programs death is temporary (say, a missing resource like a nfs share etc) and the program just "die"s instead of doing a wait-retry-cycle itself. Finally, logging and/or an alert mechanism can easily be implemented. To prevent a misunderstanding: one could (and maybe should) put that code into the original program, so the admin can leave the watch-respwan work to init. Andreas	[reply]

Replies are listed 'Best First'.

Multiple-Re: Process Reliablity
by atl (Pilgrim) on Jul 20, 2000 at 18:31 UTC

You can stop the script manually instaed of waiting for init to decide, but that's no big deal. More important is that a well chosen sleep time between restarts will keep your system responsive anyway. That would also be a crude workaround if the cause for the programs death is temporary (say, a missing resource like a nfs share etc) and the program just "die"s instead of doing a wait-retry-cycle itself. Finally, logging and/or an alert mechanism can easily be implemented.

To prevent a misunderstanding: one could (and maybe should) put that code into the original program, so the admin can leave the watch-respwan work to init.

Andreas

[reply]