in reply to Re: file state management and recovery
in thread file state management and recovery

Seems to me there may be a race condition here:

If the name of the '.tmp' files is the same as the log file the data came from, then it's the existence of the log file that matters -- in fact, I don't think you need the '.tmp' suffix. If the process dies at any stage before the log file is deleted, then it can safely be rerun. (I'm assuming each log file has a unique name over time -- for example, by including the date/time of its creation.)

If the requirement is to append stuff from each log file to one or more other files, then I would create an auxiliary file (whose name is related to the current log file being processed) and append to it the name and current length of each file written to (and close the auxiliary file -- expecting that to flush the result to disc). The auxiliary file would be deleted after the related log file. When starting the process, if an auxiliary file is found then:

This does depend on the auxiliary file being written away to stable storage when it's closed, or at least before any changes to the related files makes it to stable storage. It's also assuming that all the updated files make it to stable storage reliably after being closed, so that data is not lost when the original log file is deleted. If those are concerns, the problem is a lot bigger !

  • Comment on Re^2: file state management and recovery