in reply to Extract the middle part of a list

One issue which is sometimes ignored in these file-sharing schemes is a race condition which can lead to half-written files being processed.

You can have a problem where something like this happens:

  1. Writing process gets current epoch time A, makes up filename
  2. Writing process creates file, starts writing, doesn't finish
  3. Reading process comes along at time A+delta, notes an 'old' file with timestamp A, opens and reads it.
In this case, the reading process has seen a half-completed file.

One might argue that the writing process couldn't stall for long enough for this to happen, but that depends on the size of the file being written, whether it now (or in the future) will be writing over a network, whether the writing process has to wait to get more data, etc.

The safe way to do this (which you might already be doing) is for the reader and writer to agree on a pattern match of files to ignore (e.g. *.tmp). The writer can then create and write to a x-y-z.tmp file, flush and sync it to disk and then do a rename() on it once it's finished.

Replies are listed 'Best First'.
Re^2: Extract the middle part of a list
by chrism01 (Friar) on Jun 29, 2007 at 07:19 UTC
    Actually, that's not a problem here. My prog is just loading up some old files which were missed when the DB was down.
    I'm taking a snapshot list of extant files at the start of the prog so it doesn't have to try to play catchup with the writing process.
    Unless the sysadmins mess up the datetime params badly, the end_datetime will be some way behind the 'latest' file.
    The writer is part of a monitor system and runs 24/7. The monitor writes a file for each event and also a copy of the data should be written to a row in the DB.
    If the DB goes down, the new prog will 'fill in the gap' after the DB is fixed.
    The monitor will write event files even if it can't talk to the DB.
    In fact, the update prog will prob be fast enough to catch up anyway

    Chris

      So you can guarantee that file_xxx_N is complete if file_xxx_N+1 (or later) exists? Fair enough.

      But for reliability, the reader should check this condition. (And not process a file if it is the latest one). But that might have problems too (if there hasn't been any activity since, then you'll miss the last record).

      Really, I'm nitpicking, because the race condition is probably unlikely to be hit. But systems like this often run unattended for a long time, on systems which sometimes bog down under load. Race conditions lead to unpredictable behaviour and lots of those time-consuming "oh...we sometimes get that problem, we don't know why" issues.

      IMHO, the only safe way to do this is "create temp file, then rename".