jerrygarciuh has asked for the wisdom of the Perl Monks concerning the following question:

My content management script is capable of including images and creating a seperate page with a link from headline page for posts longer than x. Thing is that all the posts are kept in one text file making the identification of files for cleanup trickier. The image files are named by users then cleaned up on upload and have the PID tacked on the end of the name. The html files created by the script are named by the date and time of creation plus the PID.

What I am thinking of doing is having the script write the name of each image and html file added to the server by the script to a seperate text file then using the entries in that file for regex searches of the headline page. If there isn't a match then the file of that name in /images or /pages will be deleted.

Questions:


Thanks for any advice!
jg
_____________________________________________________
If it gets a little bit out of hand sometimes, don't let it fool you into thinkin' you don't care.TvZ

Replies are listed 'Best First'.
Re: Planning a Disk CleanUp Script
by theguvnor (Chaplain) on Jan 12, 2002 at 08:23 UTC

    Hi, not a direct answer to your question, but rather a piece of advice from someone who has done a similar system: consider having your scripts assign the name of the file upon upload. I use sequential integers. The reason is that it's safer to use a known-safe filename (a script-assigned one) when you open up a filehandle to write the user's file to your filesystem, rather than the user-assigned filename which may contain nasty characters such as file piping etc.

    I then update one text file that serves as an index, so I know what each numeric filename was originally called by the user, and can be used to refer to it.

      A tip.

      Use random integers, not sequential ones.

      With sequential integers you have a race conditions. If 2 instances are trying to start at once, both can choose the same name and try to write to it, with bad results. Choose random integers and then test whether that one exists already and the odds of a bad race falls by several orders of magnitude.

      Either that or have the integer in question produced by a single source, such as an autoincrement field in a database.

        Thanks for the tip tilly. You are right to suggest random integers for the reasons you have outlined. But I do want to point out that anyone writing such a system should also look to the security section in the Camel on race conditions, and also be sure to use sysopen() rather than relying on an file existence test and open(), for the reasons outlined in the Camel.