Re: file handling question

If the process that writes a given HTML file takes some noticeable amount of time between start and finish, and you want to make sure that web visitors will only see the complete form of the file, something like the following ought to be all you need:

open a new file for output with a name that won't be visible to web visitors, or create it with the intended public name but in a different path (not publicly exposed) on the same disk;
once the output to the file is complete and the file is closed (and you're sure there weren't any errors), rename the file to the intended path/name in the public web directory -- it now becomes instantly visible to the next person who (re)loads the url for the file, and from the visitor's perspective, it is never partial or incomplete.

If the web server is a unix box, "renaming" a file from one disk to another (e.g. from /tmp to /public_html) really means copying it, which will not be instantaneous -- maybe not as slow as the process that writes the file in the first place, but still not as fast as renaming a file so that it stays on the same volume, which just involves moving an inode entry from one directory to another. (This is likely to be true on any OS, even those that don't have things called "inodes".)

As for reading config files (it took me a while to get the connection between the first and second paragraph)... If you're worried that a process reading a config file might get an incomplete or "transient" version of the data -- and if this is a persistent, pernicious concern -- you might consider making up a little table (database or flat file) that stores file names with data checksums. Read the file once, compute its checksum, and if that doesn't match the checksum in the table, treat it as an error condition. (You could try reading it again after a delay, to see if the problem persists, but if it fails twice, you might as will quit.)

This would require a little more infrastructure for managing your config files, to make sure that the checksum table is updated every time a file is intentionally added, deleted or altered.

Comment on Re: file handling question

Replies are listed 'Best First'.
Re: Re: file handling question by simonm (Vicar) on Dec 17, 2003 at 06:43 UTC
If the process that writes a given HTML file takes some noticeable amount of time... something like the following ought to be all you need: open a new file for output ... once the output to the file is complete and the file is closed (and you're sure there weren't any errors), rename the file to the intended path/name Which, not coincidentally, is what the IO::AtomicFile module does.	[reply]
Re: Re: Re: file handling question by smackdab (Pilgrim) on Dec 17, 2003 at 07:24 UTC
Thanks!!! I think that confirms that my choice for IO::AtomicFile is a good one ;-) The second part, which I can see was a little confusing, is as follows: (I am on win32, but want to be portable) I have a server process and it often needs to read config files for instructions. I expect the file to always be readable, but one time it wasn't (I think I was in the debugger and maybe viewing the file also, no locking that I am aware of...). Instead of just throwing out the job, I figured I could retry opening the file a few times, maybe sleeping .1 sec in between? Is this just overkill? Curious, if you are manually editing a crontab file in vi and cron runs, what happens? (I don't have unix or cron...just use windows and we don't expect users to edit config files behind our backs ;-) thanks for any defensive programming ideas !	[reply]
Re: Re: Re: Re: file handling question by graff (Chancellor) on Dec 17, 2003 at 17:59 UTC
Curious, if you are manually editing a crontab file in vi and cron runs, what happens? cron is always running -- it's a daemon. When it starts (at boot-up), it looks for crontab files in a specific directory, reads all the ones that are present, and sets "alarm" signals for when the various jobs are supposed to run. The files are organized by user; root always has a crontab file, and it's up to the sysadmin to decide whether other users can have their own crontabs as well. There's a specific unix command called "crontab" that you run to view or edit the crontab file; if you say "crontab -e" to edit it, your default editor is invoked on a copy of your existing crontab file (or on a new file, if you never used cron before), and when you exit the editor, the "crontab" utility takes over again -- it replaces the original file with the edited copy, and sends a signal to the cron daemon that will cause cron to re-initialize based on the updated file. If you added a command to your crontab file that is supposed to run at noon every day, and you close the editor 30 seconds after noon, the command won't run till tomorrow. Most problems are avoided by keeping crontab files separated by user. Of course, on unix, a single user can be running a number of command-line shells at once, and could try to edit the same crontab file from two separate shells (this is quite possible, e.g. when two or more people, working in different locations, can log in as root); but "crontab -e" uses flock, and will not allow concurrent edits of the file.	[reply]