punch_card_don has asked for the wisdom of the Perl Monks concerning the following question:

Melodius Monks,

Did I just mess up a file processing run?

I have a 1,000,000 line text file to process. The important part of the code is:

open(FILE, $data_file) or die("cannot open file : $data_file $!"); print "<p>opened file $data_file ok\n"; while ($line = <FILE>) { DO A BUNCH OF STUFF WITH THIS LINE } close(FILE);

I start the program via telnet, and use NOHUP so it runs to the end. (NOHUP perl myscript.pl) The "print" statement gets printed out to a nohup.out file, an in my WHILE lopp there's a print statement that outputs the line number it's workig on every 1,000 lines so I know where it is.

Because there are about 1,000 operations to do on each line, it takes a looooooong time to run. It's running on a UNIX server.

In the middle of a run, someone deleted the data_file from the server by ftp.

I thought that using this form of while ($line = <FILE>) read in one line ata time from disk, and that deleting the file during arun would kill the running process, since it could no longer read from it. BUT - I see that my nohup.out file continues to be updated with lines being process. AND "ps U myusername" via telnet says myscript.pl is still running.

Is it possible that it really is still running? Have I misunderstood how ths works? Did it really read a copy of the file into cache or something that allows it to coninue with the file deleted?

Is there hope for me (or am I hopeless?)

Thanks.




Forget that fear of gravity,
Get a little savagery in your life.

Replies are listed 'Best First'.
Re: What if FILE deleted during WHILE read-in loop?
by ikegami (Patriarch) on May 12, 2006 at 22:34 UTC

    On unix, nothing. The file isn't deleted until all file handles to it are closed. Until then, removing the file only removes the entry in the directory, not the file itself.

    Of course, it's a different story if another program alters the contents of the file or changes its size. That would affect your script, because it is indeed read into memory a bit at a time. (It's buffered, so it's actually more than one line at a time.)

Re: What if FILE deleted during WHILE read-in loop?
by johngg (Canon) on May 12, 2006 at 22:45 UTC
    If you have a file that has been opened by a process, you can unlink the file, either from within the process or, as in your case, by some external agency, but the data will still exist on disk and be available to read until the filehandle is closed or the process exits. However,if you do an ls while the process is still running you will not see the file.

    Some "clever" programmers do this to create a working file for their application that nobody else can see; that's all very well until their application throws a wobbly and your disk fills up. As a sys. admin. I've been bitten by this a few times over the years. Your disk is full but you can do du commands until you are blue in the face but you can't see what file is causing the problem. If you can work out which application is causing the problem, killing it will close the filehandle and thereby free the disk space.

    If your program is still running with the open filehandle, let it run. The data is still there on disk to be read.

    Best of luck,

    JohnGG

Re: What if FILE deleted during WHILE read-in loop?
by revdiablo (Prior) on May 12, 2006 at 22:34 UTC

    Most unix filesystems will keep the actual data intact as long as there is an open file descriptor. The file descriptor should stay open until you close the file. In other words, it appears you may have dodged a bullet. Just hope the program doesn't crash for any reason! :-)

Re: What if FILE deleted during WHILE read-in loop?
by Errto (Vicar) on May 13, 2006 at 05:19 UTC
    It does vary by operating system. As others have mentioned, on Unix-type systems it works the way you've observed. On Win32, at least in my experience (unless there's some secret API I don't know about), it's simply not possible to delete a file that is opened by another (or the same, for that matter) process. This is the sort of semantic difference between OS's that no programming language can make up for. Of course, the authors of Perl being the nice people they are, have documented this in perlport :
    Some platforms can’t delete or rename files held open by the system, this limitation may also apply to changing filesystem metainformation like file permissions or owners. Remember to "close" files when you are done with them. Don’t "unlink" or "rename" an open file. Don’t "tie" or "open" a file already tied or opened; "untie" or "close" it first.

      Actually, it is possible. Kinda. If you set the right share permission, the file can be deleted. Actually, it only gets flagged as being deleted. The file still appears in the directory until all file handles to it are closed. Open file handles still function. New handles can't be created. ("Access denied".)

      For example,

      use Win32API::File qw( createFile ReadFile CloseHandle ); $|=1; my $share = $ARGV[0]; print("share: $share\n"); { open(my $fh, '>', 'deleteme') or die("Unable to create file: $!\n"); print $fh ("foo\n"); } { my $h = createFile('deleteme', 'r', $share) or die("Unable to open file: $^E\n"); { ReadFile($h, my $buf, 1, [], []) or die("Unable to read from file: $^E\n"); print("Read $buf\n"); } print("pre unlink dir: "); system('dir /b deleteme'); print("pre unlink type: "); system('type deleteme'); if (unlink('deleteme')) { print("File deleted\n"); } else { print("Unable to delete file\n"); } print("post unlink dir: "); system('dir /b deleteme'); print("post unlink type: "); system('type deleteme'); { ReadFile($h, my $buf, 1, [], []) or die("Unable to read from file: $^E\n"); print("Read $buf\n"); } print("pre CloseHandle dir: "); system('dir /b deleteme'); print("pre CloseHandle type: "); system('type deleteme'); CloseHandle($h); print("post CloseHandle dir: "); system('dir /b deleteme'); print("post CloseHandle type: "); system('type deleteme'); }

      outputs

      >perl 549202.pl rw share: rw Read f pre unlink dir: deleteme pre unlink type: foo Unable to delete file <--- post unlink dir: deleteme post unlink type: foo Read o pre CloseHandle dir: deleteme pre CloseHandle type: foo post CloseHandle dir: deleteme post CloseHandle type: foo >perl 549202.pl rwd share: rwd Read f pre unlink dir: deleteme pre unlink type: foo File deleted <--- post unlink dir: deleteme post unlink type: Access is denied. Read o pre CloseHandle dir: deleteme pre CloseHandle type: Access is denied. post CloseHandle dir: File Not Found post CloseHandle type: The system cannot find the file specified.
Re: What if FILE deleted during WHILE read-in loop?
by punch_card_don (Curate) on May 12, 2006 at 22:44 UTC
    WHEW!!

    That's why we love UNIX so much.

    Thanks. I can go home now.




    Forget that fear of gravity,
    Get a little savagery in your life.