Beefy Boxes and Bandwidth Generously Provided by pair Networks
XP is just a number
 
PerlMonks  

Check if file is written by other process

by techman2006 (Beadle)
on Jul 06, 2014 at 08:48 UTC ( [id://1092451]=perlquestion: print w/replies, xml ) Need Help??

techman2006 has asked for the wisdom of the Perl Monks concerning the following question:

I am developing an application which monitor an directory and once the files are written it perform some operations.

The use case is as below

  1. If a file is created when application is running I can get notification using INotify2. Also when file is closed after write is complete.
  2. If some file pre-exists then how to identify that all the write is complete.
  3. Let say I have a notification for file creation but next time when application is launched the file completed and closed. So the notification will be lost as the application was not running.

Now for use case 2 & 3 how I can know using Perl that file are not being used by other process for write operations and its complete. Also I don't know about the size of the files so we can't depend on the stats.

  • Comment on Check if file is written by other process

Replies are listed 'Best First'.
Re: Check if file is written by other process
by Laurent_R (Canon) on Jul 06, 2014 at 09:39 UTC
    Depending on the details of your process, you might look at your directory and store the file sizes, sleep for 10 seconds, and check for each file if the size has changed. Looking at the file dates might also help. I admit that this may not look very robust, but there are cases where this is sufficient. On some operating systems (VMS, for example), you can't open a file that is being written (it is locked), so you may just try to read the file and you know the file is still being written if you get an exception.

    Another possibility, if you control the file writing process, would be to create a second (empty) file having the role of a flag. Create the flag before opening the file for writing, and remove the flag once writing the file is completed. Then you only need to check for the existence of the flag.

    Update: When i wrote the above 15 minutes ago, I forgot another possibility, despite the fact that I am using it regularly at work. We are using the flag system described above when we want to know that the writing process has completed all the files that it is supposed to write.

    When a file by file check is needed, the solution is even easier: the writing process writes a file with a temporary name (e.g. a .tmp extension) and renames it to its final name once it has completed the task. For example, the writing process write an output.tmp file. Once the file is complete, it renames it as output.txt. Your process then only needs to poll for *.txt files.

Re: Check if file is written by other process
by roboticus (Chancellor) on Jul 06, 2014 at 13:07 UTC

    techman2006:

    The technique I find best in that situation is to generate your file using an intermediate filename, and rename it to the final name only after you've finished writing it. That way, the program that processes the files can simply look for the final file name(s). You can use a temporary directory and rename the file into the production directory, a different base name or a different extension.

    If you can't modify the program that generates the file, then write a wrapper script that will execute the program and then rename the file after the program completes.

    ...roboticus

    When your only tool is a hammer, all problems look like your thumb.

      I need to check if that is possible. But not sure so any approach through programatically I can check the same. The machine on which this script will run is Linux.

Re: Check if file is written by other process
by Anonymous Monk on Jul 06, 2014 at 10:15 UTC

    What is the process that is writing the files, and do you have any control over it? A few common approaches for writing the files are to use file locking (even advisory locking would help you), to use some kind of flag file, to rename/move the file once the write is done ... is anything like that possible? Also, exactly what filesystem is this on?

    Perhaps lsof (Unix::Lsof) and its +D option could help you?

      Its an program which download files from some remote host. Also I can't control the way its right the data like writing a wrapper script etc.

      The machine is Linux based so I can use INotify functionality with out any issues.

        Is the program open or closed source, and can you tell us what it is called? Does the program's documentation say anything about using, for example, flock(2), which Perl supports? Can you contact the author(s) of the program to ask if it supports any locking or other control mechanisms? Lastly, I were you, I'd maybe try to test out if it uses some kind of locking without it being documented - again, lsof can help you find that out.

Re: Check if file is written by other process
by boftx (Deacon) on Jul 06, 2014 at 23:05 UTC

    Others have already mentioned using a wrapper if you can not alter the script itself. If the script is being run from a cron job (which it sounds like it is) then you can do something like the following for a wrapper:

    # NOTE! this is only a general outline of the concept, # and is NOT actual working code. # wrapper.pl use strict; use warnings; my $flagfile = '/path/to/flag_file.txt'; # you might want to test and see if this exists first, indicating a po +ssible # failure in code elsewhere. # # also, you might write the PID of the current process into the file s +o you # can check later if the process is still running if the file is prese +nt. open my $flag, '>', $flagfile; close $flag; my @script_params = ( '-opt1 yes', '-opt2 no', ); my $script = '/path/to/script.pl'; # use the multi-arg version of system to avoid lot's of gotchas. RTF(i +ne)M. system( $script, @script_params ); rm $flagfile; exit;
    You would then run the wrapper in the cron job instead of the script itself. Now all you have to do is see if the flag file exists, and if it does, you know the main script has not finished processing yet.

    You must always remember that the primary goal is to drain the swamp even when you are hip-deep in alligators.
Re: Check if file is written by other process
by sundialsvc4 (Abbot) on Jul 07, 2014 at 13:54 UTC

    The best way to do it would be to arrange for the various processes to talk to one another.   On some systems, it might be possible to request exclusive write-access to the file as a way of making sure that no one else is writing to it ... but on other systems, where locking is only “advisory,” that won’t work.   There is no “generalized” solution to this sort of objective.   You will need to understand both the readers and the writers very well.   Again, by far the best way to do it is to arrange for the writers to inform you when they have finished writing a particular file:   don’t rely upon notifications, but instead arrange for the writers to write filenames to a pipe or queue that the next-stage process(es) will be reading.   Otherwise, you will always be dealing with race conditions, obliging you to try to minimize their effect.   (They will still race, and fail, and they will inevitably do it at 2:30 AM when you’ve got the pager ...)

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1092451]
Approved by AppleFritter
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others meditating upon the Monastery: (3)
As of 2024-04-25 09:23 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found