blackrat has asked for the wisdom of the Perl Monks concerning the following question:

History:

Multiple processes are appending to and updating multiple files. "\n" line/record delimiter. The hardest hit files were being truncated, sometimes in the middle of a line. Had been using flock() per numerous examples, but didn't work. I suspect something to do with locks in user space.

Partial solutions:

Switched to using DBI::CSV and I wrote a daemon with example (verbatim) from IO::Select using IO::Socket with a simple protocol to arbitrate file access. Sockets are being read and printed using "\n" to complete:

Process requested a socket ==> Daemon accepted socket request
Process asked for access ==> Daemon accepted request - via socket
Daemon granted access ==> Process accepted access - via socket
Process accessed file
Process gave up access ==> Daemon accepted - via socket
Process and Daemon moved on

Results:

Immediate results were positive. It took a while before I started to find "signs" of corruption which couldn't be verified - some data seemed inconsistent, though there could be a perfectly reasonable explanation for it. Then I found some log files (should be simple appends) that had portions of lines mushed together, and some lines that had embedded "\n" where the processes couldn't be generating it.

Question:

Is there some way to ensure that the DBI::CSV actually locks files for update, or some way to make sure any write buffers it uses are being flushed with every write/update?

I can't think of anything else that may be allowing processes to clobber each other's file updates.

Thanks.

Replies are listed 'Best First'.
Re: DBI::CSV file locking?
by Fletch (Bishop) on Oct 06, 2008 at 16:14 UTC

    While it doesn't maintain the same backend data format, if you want a single file solution you might look into DBD::SQLite instead and let the SQLite library handle these issues for you (see here for details and caveats).

    The cake is a lie.
    The cake is a lie.
    The cake is a lie.

      Thanks for your response.

      I have had SQLite recommended by others as an option for the backend. Actual database capabilities of the app are fairly light, so this may actually be a better option than mySQL, which was also being considered.
Re: DBI::CSV file locking?
by moritz (Cardinal) on Oct 06, 2008 at 16:06 UTC
    I believe you are actually talking about DBD::CSV?

    The documentation says it should work, if your system supports flock - which leads us to the question what your system is.

    That aside I think in the long term you might be more happy with a "real" DB system, of course depending on your particular application.

      Thanks for your response.

      DBD::CSV? probably. My code just has "use DBI;"

      Current system is Linux FC3 and will be migrated to latest CentOS in the near future.

      I've determined that flock() is not reliable on this system. All files in question are accessed only from library routines, and switching from direct Perl functions to read, to the DBI, definitely made a difference.

      As you suggest, moving to a real DB may be required. It was being considered for some time.
Re: DBI::CSV file locking?
by Illuminatus (Curate) on Oct 06, 2008 at 16:41 UTC
      Thanks for your response.

      The link is useful, and will provide more information to consider, especially for other projects. In particular, if we move from the DBI to accessing the files directly, there will be some more research and testing.

      Because we were looking at migrating the app to multiple servers and NFS, the daemon I wrote SHOULD HAVE worked as a networkable file access arbitrator, running on the NFS server. As I understand it, this is the way the Open Distributed Lock Manger is supposed to do it (reference "Linux Enterprise Cluster" by Karl Kopper). However, the most recent I can find on it is 2004 or 2006 on sourceforge.

      I've seen such conflicting information on flock(), though, and have a hard time bringing myself to trust it. Maybe a real DB is my only solution.
        Don't trust flock. It only works if your file is on the same physical system as your code. Even then, it doesn't always work. Furthermore, flock is only advisory. The OS doesn't enforce a flock to all processes.

        Frankly, all flock does is put a flag on the file. If another process actually bothers to check the flag, the OS will let it know what the state is. If the other process just goes ahead and opens the file anyway, the OS will let it.


        My criteria for good software:
        1. Does it work?
        2. Can someone else come in, make a change, and be reasonably certain no bugs were introduced?