Avoiding race condition with sysopen

revdiablo has asked for the wisdom of the Perl Monks concerning the following question:

Hello, good Monks. I have a problem which I think I have solved, but I'd like to pass it by everybody to see if there's anything silly I overlooked. A question from earlier today (File opening/creating) -- along with Abigail-II's answer -- prompted me to think about possible race conditions in my own code.

Based on a small amount of research (and a bit of perusal of File::Temp's source), I have found a solution using sysopen which I think eliminates a previous race condition.

Here is my original code (simplified a bit to lay bare the algorithm):

    my $outfile = "a";
    $outfile++ while -e $outfile;
    open my $fh, ">", $outfile or die "$outfile: $!";
[download]

And here is the replacement I've come up with:

    my $outfile = "a";
    my $fh;
    $outfile++ until sysopen $fh, $outfile, O_WRONLY|O_CREAT|O_EXCL;
[download]

This code looks good to me, and it works. I have never before used sysopen, though, so I'm not sure if I've made any mistakes (e.g. with the sysopen flags), or if my approach is wrong-headed. Just in case it's not blindingly obvious, I'm trying to create a new file for output, with a sequentially increasing name, but without overwriting previous files. Any thoughts appreciated.

Update: just for clarification, I'm not seeking to make temporary files. If that were the case, I'd use File::Temp. Instead I'm after permanent files with unique and sequential file names.

Comment on Avoiding race condition with sysopen Select or Download Code

Replies are listed 'Best First'.
Re: Avoiding race condition with sysopen by tachyon (Chancellor) on Mar 04, 2004 at 23:44 UTC
The logic is OK but you have created a possible infinite loop. sysopen may fail with the flags you have if the file exists but it may fail for other reasons ie Permissions. If you have a perms problem you code will loop forever burning 100% CPU in all liklihood. Here is a tempfile function we use that may be the sort of thing you are after. It returns the locked $fh and $filename (for unlinking or whatever) if it succeeds and limits the number of retries to less than infinity :-) Note File::Temp is a well tested solution and offers features not present in this function. use Fcntl; # need constants O_CREAT \| O_EXCL \| O_RDWR sub get_tempfile { my ( $dir, $prefix, $max_tries ) = @_; $dir \|\|= '/tmp'; $prefix \|\|= 'temp'; $max_tries \|\|= 10; my @CHARS = ( 'A' .. 'Z', 'a' .. 'z', 0 .. 9, '_' ); ui_system_error( "$dir is not a directory" ) unless -d $dir; ui_system_error( "$dir is not writable" ) unless -w $dir; $dir =~ s!/$!!; my $fh; for ( 0 .. $max_tries ) { my $name = sprintf "$prefix-%d-", time(); $name .= $CHARS[ int( rand( $#CHARS ) ) ] for 1..10; my $path = "$dir/$name"; return ($fh, $name) if sysopen($fh, $path, O_CREAT \| O_EXCL \| O_ +RDWR, 0600); } ui_system_error( "Could not create unique filename after $max_trie +s" ); } [download] cheers tachyon	[reply] [d/l]
Re: Avoiding race condition with sysopen by Abigail-II (Bishop) on Mar 04, 2004 at 23:54 UTC
Any reason you aren't using `File::Temp`? Abigail	[reply]
Re: Re: Avoiding race condition with sysopen by tachyon (Chancellor) on Mar 05, 2004 at 01:10 UTC
Several actually. The code is essentially the same as File::Temp uses but more compact as it just does what we want. File::Temp is 1800 lines, has lots of features and is cross platform. I just need the 15 lines I have running on one platform. The temp file names we use also include $$ (I edited that out) as we like to know which processes are sh!ting in the tmp dir so we know who to blame ;-) We handle all error conditions via our standard handlers without having to resort to catching with eval. We have a test suite for this that makes sure all is well with this bit of code so should there be an issue we would find it in normal testing. Yes we could just test File::Temp ourselves but testing a single short function is far easier. And finally because we crossed swords on usenet maybe 4 years ago on the best way to make temp files so I looked into File::Temp (on your advice) and extracted the core features into that snippet. While using modules is often a good idea we have found that Perl tends to leak memory when run persistently and hard. Less code ie just what you need has two effects. Perl leaks less and the leaks are easier to fix. LWP and URI as well as our own OO modules are problematic in this regard in our experience. It also uses less resources mainly memory and runs faster. I think a strong case could be made that at 1800 lines File::Temp is bloatware. cheers tachyon	[reply]
Re: Re: Avoiding race condition with sysopen by revdiablo (Prior) on Mar 05, 2004 at 02:54 UTC
Thanks, this is exactly the kind of thing I was looking for. I'll likely end up stealing your sanity checks. :-) As for your reply to Abigail-II's note, I am avoiding File::Temp for similar reasons. Even adding the extra checking you have here, my code will do far less and have a much smaller intended scope than File::Temp. Furthermore, I am not making temporary files, so using a module designed to do so seems to me kind of like using `$#array+1` when I want `scalar @array`.	[reply] [d/l] [select]
Re: Avoiding race condition with sysopen by bart (Canon) on Mar 05, 2004 at 09:03 UTC
I'm after permanent files with unique and sequential file names. Then enforce that by storing a sequence number somewhere, permanently. Make all programs that create files in that directory, use it. Think of an autoincrement field in a database (why not use that? SQLite might be a decent and fast solution), or a setup with a little file holding the number, pretty much like the "web access counters" of a decade ago.	[reply]
Re: Re: Avoiding race condition with sysopen by revdiablo (Prior) on Mar 05, 2004 at 19:57 UTC
stor[e] a sequence number somewhere, permanently. That's exactly what I'm doing. My database is the filesystem. My "sequence file" is a directory full of plain files. Building all sorts of infrastructure to support what could be accomplished in 5 or 10 lines of code that just looks at the filesystem... seems kind of, well, overkill.	[reply]
Re^2: Avoiding race condition with sysopen by graff (Chancellor) on Jun 28, 2004 at 04:23 UTC
Think of an autoincrement field in a database (why not use that? ...), or a setup with a little file holding the number, pretty much like the "web access counters" of a decade ago. To augment revdiablo's reply, if someone is already using a database for things related to the task at hand, then for sure, it makes better sense to use a database to create this sort of "guaranteed unique" identifier. But if there is no database facility on hand -- and it's just a matter of using the file system -- then installing a database just for this purpose is more trouble than it's worth. On the other hand, using a "little file" somewhere doesn't help much either, really -- it just adds another few ways that the "guaranteed unique" naming could fail (I've seen it happen). For instance, if there is any sort of error when writing the lastest (or next) increment to that little file, it might end up unchanged, or worse yet, empty -- and unless you're really careful about that, you could start you're sequence over at 0 -- perhaps multiple times -- without noticing until it's too late. When you are careful about it, you'll find yourself looking at the directory contents anyway, to determine the latest-used file name (as in the OP) or at least to confirm that a file name you make up on the fly does not already exist (as in the solution proposed by tachyon), so you might as well just stick with that -- it's both necessary and sufficient. If I were looking for sequential file names, as suggested by the OP, I would start with `$last = (sort { $b cmp $a } readdir( DIR ))[0];` [download] so that I easily get the latest item in the directory and increment from there. But in any case, if there were a chance that the script would be run in concurrent processes on the same directory, I'd add a semaphore file, so that the steps of checking directory contents and creating the new file would be an atomic unit, doable by only one process at a time.	[reply] [d/l]