in reply to Re: I/O Watchdog Daemon
in thread I/O Watchdog Daemon

I have considered doing weekly reboots. It would cause unwanted downtime, but on the other hand it is far cleaner and safer to do "shutdown -r now" than "reboot -nf" :) I still might do this -- or both.

Not sure what you mean about using Knoppix. I have no hands on the box. It's 2000 miles away and I don't have out of band management.

I wouldn't say that the OS is failing... In this state, the kernel is fine and processes are still responding (so long as they're not accessing the SSD). Since the daemon is running in memory, then it should be fine. It's the potential EIO failures that I want to detect that are the primary issue and if I can trigger the reboot -nf without any disk I/O then I think it will be an acceptable band-aid until the situation can be resolved permanently.

Thanks!

Replies are listed 'Best First'.
Re^3: I/O Watchdog Daemon
by flexvault (Monsignor) on Aug 22, 2012 at 17:05 UTC

    IdleResonance,

    I wasn't trying to solve your problem, but to help you start 'thinking outside the box'.

    You said you do not know how to test the Perl solution. For me, if I can't test a solution, then I wouldn't depend on it working, but that's me.

    I have never used a SSD, but I have been told that they can reboot in less than a minute. Since you have the equipment, you know that answer. Only you can evaluate and weight the value of a minute of downtime versus unpredictable downtime.

    But I think you're on your way! ( No giggling AM! )

    Good Luck!

    "Well done is better than well said." - Benjamin Franklin