Re^2: file state management and recovery

Seems to me there may be a race condition here:

if the '.tmp' files are renamed before the original log file is deleted, then the data will be repeated if the process dies before the log file is deleted.
if the original log file is deleted before the '.tmp' files are renamed, then the data will be lost if the process dies before the '.tmp' files are renamed.

If the name of the '.tmp' files is the same as the log file the data came from, then it's the existence of the log file that matters -- in fact, I don't think you need the '.tmp' suffix. If the process dies at any stage before the log file is deleted, then it can safely be rerun. (I'm assuming each log file has a unique name over time -- for example, by including the date/time of its creation.)

If the requirement is to append stuff from each log file to one or more other files, then I would create an auxiliary file (whose name is related to the current log file being processed) and append to it the name and current length of each file written to (and close the auxiliary file -- expecting that to flush the result to disc). The auxiliary file would be deleted after the related log file. When starting the process, if an auxiliary file is found then:

if the related log file exists, then the process needs to be restarted, truncating each file recorded in the auxiliary file to it's original size.
if the related log file does not exist, then the process died after completing all useful work, so the auxilliary file can be deleted.

This does depend on the auxiliary file being written away to stable storage when it's closed, or at least before any changes to the related files makes it to stable storage. It's also assuming that all the updated files make it to stable storage reliably after being closed, so that data is not lost when the original log file is deleted. If those are concerns, the problem is a lot bigger !

Comment on Re^2: file state management and recovery