Re: strange behaviour, would appreciate any comment / alternative method

A couple of things here make me nervous. The first is the ability to tell when a file is available. Now, perhaps your files will be incredibly small, so the act of copying them across the network and updating the directory structure will be basically atomic. I doubt it. So that means that a simple "-s" test is not sufficient to tell when a file is finished uploading. 1MB of 700MB may be uploaded so far, and then things can really go wonky. You'll probably want to adjust your protocol so that some atomic operation can be done by the uploader to indicate to the daemon that the file is ready. There are a few simple choices, and one more complicated choice, that I can think of here. The first one is to have the application that puts up the job file create a file with the same name, but adding ".done" to the end. This file would have no contents. But, because it is created after the main job file is done, then we know the main file is done. To do this, something like { open my $fh, '>', "$jobfile.done"; } should do it (create the file, filehandle goes out of scope, it's closed). However, this still leaves a bit of a hole - what if the server deletes the file between the open and close? There's not much time there, but I don't know what would happen. Probably can be handled if you think about it.

The second option is to upload the file to a different directory than the server parses them from. Once the copy is finished, a simple rename to the correct directory should be atomic. I can't think of a race condition here.

The third option is to move the job status into a relational database. I'm not sure if your files can go there or not, but, if not, the metadata (i.e., "job 123 ready for pickup") can be inserted by the uploader, and the server stops monitoring the filesystem, instead monitoring the database (with a less-efficient regular poll). The downsides here are many, including setting up a db if you don't already have one. But the upsides include that the db should have transactions (thus the insert is always atomic if done properly), and locking. With the locking, you can lock tables/rows from the server side such that a second server could also poll the db, should you ever find the need for jobs to scale such that they are being run on multiple machines.

The second thing I'm nervous about is simply the ability to lock with samba. This may work. It just makes me nervous. That's a lot of stuff that needs to go right - I wouldn't actually want to lock on NFS, either, so maybe I'm just paranoid. Locking in a db seems safer to me :-) (and option 2 above - renaming the file after it's finished - avoids this as well.)

I'm not sure why you copy the jobstack to a temp, manipulate the temp, and then copy it back. If you can pull the item you want from the original shared @jobstack in a small lock, that'll be way better.

my $job = do {
  lock (@jobstack);
  extract_job(\@jobstack); # this will find the next one to do, and re
+move it from the list, and return it
};
[download]

This saves a bunch of copying, and keeps the lock to a minimum. Locks are heavy-handed things. You want to avoid them whenever possible, and where not possible, you want to reduce their scope to a bare, bare minimum. Otherwise your other thread will block when file changes come in. Of course, in your sample code, you're not doing anything, so it's not yet a big deal, but I assume there is or will be more code in your main thread in your real code that does significantly more work, otherwise you wouldn't bother with all this :-)

Comment on Re: strange behaviour, would appreciate any comment / alternative method Select or Download Code

Replies are listed 'Best First'.
Re^2: strange behaviour, would appreciate any comment / alternative method by djamu (Initiate) on Jul 27, 2011 at 07:14 UTC
Hi Tanktalus, thanks for your comprehensive reply. ( Good pointers there. Seems you have a real good understanding of what I'm trying to do ) The snippet I provided does do a good job except for that weird "print" issue, which actually isn't one as I don't need it > I just noticed I get's called for no apparent reason -and it shouldn't / couldn't- on concurrent access but the fact that it does is weird. I searched CPAN for any smb server related module but couldn't find one ( plenty of client / auth ones though ) To get back at your suggestions, I'm actually already using option 2 + 3 ( this is the "different" directory ), the snippet checks ( the smbstatus is correct and definitive ) if a jobfile finished uploading, the "-s" test is merely there for slow / old / wan clients that "touch" a file before copying it ( and might very briefly release the lock > the -s test only for zero filesizes, ). A mini parser checks file type > moves it to a different directory and dumps it in a database available to all compute nodes > all further locking is handled by the database engine and the real parser(s) are run on the individual compute nodes.( FYI This is part of a new SSI framework that uses bittorrent to distribute itself to be released within a couple of weeks and should scale in the 100/1000's ) Option 1 is nearly impossible as I'm dealing with real people and 3rd party applications. ( 3D applications / simulation etc ... ) The reason for using smb is that all OS platforms support it ( and users without much network skills ) so no real network fs (NFS etc) and certainly not a distributed one like ocfs2 etc. I was just curious if I was missing something obvious... And use only 1 small lock in the main as you suggested. The temp thing was just a splice test, as I will need that later ( not meant as a shift pop replacement ) "....but I assume there is or will be more code in your main thread in your real code that does significantly more work, otherwise you wouldn't bother with all this.." lol yes a little bit ;-)	[reply]

Replies are listed 'Best First'.

Re^2: strange behaviour, would appreciate any comment / alternative method
by djamu (Initiate) on Jul 27, 2011 at 07:14 UTC

[reply]