ecuguru has asked for the wisdom of the Perl Monks concerning the following question:

Hi,
I am going to have one perl script, always running, printing new data to the same .txt file that is hosted inside an Apache site.

The .txt page is going to get hit a lot, and quickly. Hella quickly.
I'm concerned that either my users will get a file not found, or that the perl script won't be able to write the file because there are so many users requesting it.
I think I only need Flock when multiple perl script instances are trying to write to the same file.
Do I need to do anything special to my Perl script to tell it, even if the file is busy getting hosted, WRITE the file?
My current file writing code is below. I'm writing about 25K at a time.

thanks
Tim
#Is this efficient enough, or do I need something else? sub align{ open (DEST, ">$file2") || die print "Can't open for Write"; print DEST @buffer; close DEST; }

Replies are listed 'Best First'.
Re: Apache and file generation - flock?
by matija (Priest) on Mar 01, 2004 at 10:08 UTC
    It is bad karma to be writing a file that other processes are reading - you might get people who get part of the old file, and part of the new, and they probably wouldn't like that...

    A better idead would be to write to a temporary file, and then, once the write is finished, rename the file to the name that the other processes use.

    On most (all?) systems, those processes that have opened the old file will stil get to read the old file until they close it. But the processes that open the file after the rename, will get the new data. Best of all, you will have a guarantee that nobody will be reading a partialy-written file.

      Ok, one process writes to a temporary file, and when it has finished writing, renames the file. At the sametime, another process writes to it's own copy of the temporary file, and when it finished, renames the file... The second process will clobber what's been written by the first process.

      To have many processes successfully writing to the same file, you would need some sort of locking mechanism in place to protect critical parts of the program. And the locking can be implemented using IPC::Semaphore on *nix and Win32::Semaphore on windows.

      Now, combine the semaphore with the temporary file, and the new algorithm looks like:
      process start if read then obtain shared read lock on the original text file read the data release shared lock elsif write then obtain exclusive write lock on the temporary file write the data obtain exclusive write lock on the original file overwrite the original file with new data release lock on original file release lock on temporary file end if

      This algorithm penalizes the writer, assuming that there are more readers than writers, and that writing is a lengthy process. The benefit is that the data is kept consistant with locking, also the readers can still read the original file while the new file is being written.

      Another variant is to omit the temporary file and let the writing process lock the original text file, but that imposes penalty on reading processes where all readers must wait for the writer to finish, if the writing of data is an expensive process.

      If you want to minimize the penalties to both the reader and the writer, then you need some more elaborate caching and locking algorithm, which is probably out of scope.