bart has asked for the wisdom of the Perl Monks concerning the following question:

I'm busy writing a module based upon File::Copy, to copy/move a file to a different location, but which avoids overwriting an exisitng file: if a file of the desired name exists, it'll look for a variation of the file name, involving a count number, that isn't taken yet. Think of how Windows Explorer, if you copy/paste a file, will not replace an existing file, but instead, uses a derived name.

I'm not sure what File::Copy will do in case a destination filename exists: will it replace the old file, or fail to copy? I've had a good look at the source already, and it looks to me like copy might fail if the destination file exists, at least on Windows; but move will overwrite it — if only because move will try to rename, and AFAIK that always replaces an existing file. But I'm not really willing to trust that to be the case for all platforms, and I'm not even planning on trying them all out and write platform-dependant code. I've already had enough troubles with API calls that behave differently on one and the same platform, but on different types of disks (NTFS vs. FAT). So that's a dead end.

I've already developed two strategies (to avoid race conditions), for what to do if you're sure a copy will fail, and what if a copy will overwrite an existing file. I'm using a sub (details not important now) that composes a filename out of the base name, the file extension and a counter number. For example, a file "landscape.jpg" could be copied to a destination "landscape(1).jpg". And "copy" is used as an example here, it could be any of File::Copy's copy/move.

Case 1: an existing file is replaced

I simply try to create the destination file in a mode that fails if the file already exists, and then simply overwrite the newly created file. The code looks like this:
use Fcntl qw(O_CREAT O_EXCL O_WRONLY); my $i; while(1) { my $dest = compose_name($base, $ext, $i++); if(sysopen my $fh, $dest, O_CREAT | O_EXCL | O_WRONLY) { close $fh; copy $source, $dest; last; } }

An extra caveat is that I've read that O_EXCL isn't reliable on NFS.

Case 2: copy fails if the destination exists

The idea is to just try to copy the file to a new name, until it succeeds. What is lacking is protection against different reasons for failure. The simplified code looks like:
my $i; while(1) { my $dest = compose_name($base, $ext, $i++); if(copy $source, $dest) { last; } }

I don't know what case I'm in, and I'd like to use common code for copy and for move. What I need is an algorithm that works reliably for either case. I've thought of the following:

use Fcntl qw(O_CREAT O_EXCL O_WRONLY); my $i; while(1) { my $dest = compose_name($base, $ext, $i++); if(sysopen my $fh, $dest, O_CREAT | O_EXCL | O_WRONLY) { close $fh; my $tempfile = generate_tempfilename($destdir); copy $source, $tempfile; rename $tempfile, $dest; # should replace file last; } }
but the problem remains: how to generate a unique temporary filename without clobbering other existing files? It could be that several scripts using the same module are trying to copy similarly named files to the same disk, and thus, we have a race condition. And there's still the problem of NFS.

So, I'm polling for ideas... how would you tackle this?

Replies are listed 'Best First'.
Re: How to move/copy a file without overwriting an existing file
by BrowserUk (Patriarch) on Mar 30, 2006 at 22:31 UTC

    If it were just for Win32, I'd use Tye's Win32API::File::MoveFile() to bypass Perl's POSIXisation of rename and have the attempts fail rather silently overwrite an existing file:

    use Win32API::File qw[ MoveFile ]; my $i=0; $i++ until MoveFile( $source, compose_name( $base, $ext, $i ) );

    Or better yet, Jenda's Win32::FileOp::MoveFileEx() with the FOF_RENAMEONCOLLISION and let the OS take care of finding an available name.

    Either of those should work on any Win32 filesystem, though I'm not sure about things like Samba. Combined with ambrus' suggestion for true posix systems, it might get you 90% there for portability, though it would require platform specific code.

    Any solution that polls for existance before creation, or relies on cooperative locking leaves a gap. If you are forced to that route, using random numbers rather than sequential would reduce the occurance of collisions, and perhaps reduce the possibility of race condition errors.


    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
Re: How to move/copy a file without overwriting an existing file
by ambrus (Abbot) on Mar 30, 2006 at 20:39 UTC

    It's a bit easier if you can co-operate with the other programs that might create files there.

    Anyway, here's one solution that can work if the other programs don't co-operate (of course, the other programs can be very badly behaved, and overwrite your files by force :), even on NFS.

    First, create a new empty directory in the destination directory. We assume that only you will use this directory, and other programs won't create files in it. Then copy the file into that directory with an arbitary name. Than create a symlink in the original directory with the name you are planning to use as the name of the old file eventually. Creating a symlink doesn't overwrite an existing file (not even a symlink), but rather fail in that case. If that happens, choose a new name. Then, rename the copied file from inside the directory to the same name as this symlink. This will overwrite the symlink, but do it atomically: the name will never cease to exist. Finally, remove the directory.

    (If you want to copy a directory, this won't work, but that case is simpler anyway, as directory creation won't overwrite anything anyway.)

    Update: there are existing functions (not neccessarily perl) that create unique temporary files. Try looking at their sources to see what they do.

    Update: it might be simpler to just create a file with the mknod syscall, because that doesn't overwrite an existing file. Then, you don't have to move anything. That works through NFS too (although I didn't test whether it was really atomic), but it might not work on all systems.

Re: How to move/copy a file without overwriting an existing file
by GrandFather (Saint) on Mar 30, 2006 at 21:02 UTC

    Part of the solution, if there may be multiple instances of the same script running, may be to maintain a lock file that mediates allocation of file names.

    Possibly store the name and a next number for each used file. The atomic access to the lock file then adds a new name or increments the count on an existing name and in the process generates the name to be used.


    DWIM is Perl's answer to Gödel
      But that only improves on it if the other one is indeed using the same module. I won't even exclude the possibility that the other program is written in another language, like Java.

      I don't want my script to overwrite another file, ever, even if the other script occasionally might overwrite one of mine. I do not want my module to get the blame with good reason. Race conditions should be excluded from my side, always, all of the time.

      That limits the options a little, doesn't it?

Re: How to move/copy a file without overwriting an existing file
by Anonymous Monk on Jan 30, 2017 at 01:21 UTC
    "O_EXCL isn't reliable on NFS" is an overstatement so severe it seems to say something different than the actual behavior. It should be "O_EXCL isn't 100% reliable on NFS if two different systems are manipulating the same file at virtually the same time". If your program is only running on one system, or if the file has already existed for a couple minutes, then O_EXCL _is_ 100% reliable.
      It should be "O_EXCL isn't 100% reliable on NFS if two different systems are manipulating the same file at virtually the same time". If your program is only running on one system, or if the file has already existed for a couple minutes, then O_EXCL _is_ 100% reliable.

      So, according to your definition, coitus interuptus is a 100% reliable method of birth control as long as you avoid some edge cases, like a man having sex with a woman.

      Regarding pregnancy, there are exactly two states: Pregnant or not. You can't be 42% pregnant. And like with pregnancy, O_EXCL on NFS (or any other networked filesystem) is either reliable and prevents access to a file by more than one process on one machine every time and under all conditions, or it is unreliable.

      For Linux, the man page for open explains the situation quite well:

      • In general, the behavior of O_EXCL is undefined if it is used without O_CREAT.
      • On NFS, O_EXCL is only supported when using NFSv3 or later on kernel 2.6 or later.
      • In NFS environments where O_EXCL support is not provided, programs that rely on it for performing locking tasks will contain a race condition.

      So, for Linux, O_EXCL is unreliable if you are running a kernel before 2.6 or use it on a non-v3 NFS. That does not mean that it is reliable with 2.6+ and NFSv3. The FreeBSD people seem to have had similar problems, especially if you mix in Solaris.

      Alexander

      --
      Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)