in reply to methods of recovering from ram issues
involves storing data in memory via hashes for a period of time ... if the application crashes for some reason.. due to power or perhaps even faulty memory.. how can i recover ...
What is the source of that data? How long does it take to (re-)build the hashes?
how can i recover from that point
Do you need to recover from that point?
I've seen people go to extraordinary lengths to try and checkpoint the state of an application at regular intervals during its runtime with a view to allowing it to pick-up from where it left off in the event of failure. Where in many cases, it is easier, cheaper and more reliable to simply run the process again from scratch.
Not always of course, but surprisingly frequently the economics of building ever more elaborate process monitoring, check-pointing, on-the-fly replication, load-balancing, redundancy and fail-overs into a system simply do not stand up to scrutiny. Each new layer of defensive mechanisms adds both cost and complexity to the system, and complexity is the absolute antithesis of reliability. And that growth in complexity (and therefore cost) is not linear, but rather exponential as the 'need' to: monitor the monitor; backup the backup; have redundancy for the redundant; becomes institutionally imperative.
And in the end, it's never the thing you thought might fail, that does. I still have memories of many very long hours freezing my fingers off monitoring a data-scope before discovering that the lift-motor in the unit next door, the other side of a concrete wall a couple of feet thick, would produce copious amounts of broad spectrum RF interference whenever they took a delivery of peanuts and cashews. (The are very dense products which made it easy to overload their lift.) At that point, all network communications between the multiply redundant fail-over servers ceased, their heart-beat checks failed, and they all tried to step in to take over from each other. Result: When the RFI ceased, all the servers were trying to do all the jobs and everything got corrupted.
The best advice I can give is: make each process as simple as possible and have it be driven by the arrival of its data; have it process input data in discrete chunks; never discard your source data.
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^2: methods of recovering from ram issues
by locked_user sundialsvc4 (Abbot) on Dec 21, 2011 at 00:33 UTC |