A couple of things here make me nervous. The first is the ability to tell when a file is available. Now, perhaps your files will be incredibly small, so the act of copying them across the network and updating the directory structure will be basically atomic. I doubt it. So that means that a simple "-s" test is not sufficient to tell when a file is finished uploading. 1MB of 700MB may be uploaded so far, and then things can really go wonky. You'll probably want to adjust your protocol so that some atomic operation can be done by the uploader to indicate to the daemon that the file is ready. There are a few simple choices, and one more complicated choice, that I can think of here. The first one is to have the application that puts up the job file create a file with the same name, but adding ".done" to the end. This file would have no contents. But, because it is created after the main job file is done, then we know the main file is done. To do this, something like { open my $fh, '>', "$jobfile.done"; } should do it (create the file, filehandle goes out of scope, it's closed). However, this still leaves a bit of a hole - what if the server deletes the file between the open and close? There's not much time there, but I don't know what would happen. Probably can be handled if you think about it.
The second option is to upload the file to a different directory than the server parses them from. Once the copy is finished, a simple rename to the correct directory should be atomic. I can't think of a race condition here.
The third option is to move the job status into a relational database. I'm not sure if your files can go there or not, but, if not, the metadata (i.e., "job 123 ready for pickup") can be inserted by the uploader, and the server stops monitoring the filesystem, instead monitoring the database (with a less-efficient regular poll). The downsides here are many, including setting up a db if you don't already have one. But the upsides include that the db should have transactions (thus the insert is always atomic if done properly), and locking. With the locking, you can lock tables/rows from the server side such that a second server could also poll the db, should you ever find the need for jobs to scale such that they are being run on multiple machines.
The second thing I'm nervous about is simply the ability to lock with samba. This may work. It just makes me nervous. That's a lot of stuff that needs to go right - I wouldn't actually want to lock on NFS, either, so maybe I'm just paranoid. Locking in a db seems safer to me :-) (and option 2 above - renaming the file after it's finished - avoids this as well.)
I'm not sure why you copy the jobstack to a temp, manipulate the temp, and then copy it back. If you can pull the item you want from the original shared @jobstack in a small lock, that'll be way better.
This saves a bunch of copying, and keeps the lock to a minimum. Locks are heavy-handed things. You want to avoid them whenever possible, and where not possible, you want to reduce their scope to a bare, bare minimum. Otherwise your other thread will block when file changes come in. Of course, in your sample code, you're not doing anything, so it's not yet a big deal, but I assume there is or will be more code in your main thread in your real code that does significantly more work, otherwise you wouldn't bother with all this :-)my $job = do { lock (@jobstack); extract_job(\@jobstack); # this will find the next one to do, and re +move it from the list, and return it };
In reply to Re: strange behaviour, would appreciate any comment / alternative method
by Tanktalus
in thread strange behaviour, would appreciate any comment / alternative method
by djamu
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |