strange behaviour, would appreciate any comment / alternative method

djamu has asked for the wisdom of the Perl Monks concerning the following question:

Hi all,
I found a better solution so I'll just post it at the end of original post, as it does not explain the strange print issue of the original snippet, the new code is shorter less expensive and more correct
following is part of a threaded script for an HPC job scheduler ( cluster ), that notifies a file parser when a file ( job ) is copied in a specific folder using SMB. ( parser will remove it afterwards, job files are not supposed to be altered. It works well, but for some reason, -and only when 2 clients are simultaneously copying different files to the watch folder- the print command at the end is called without a reason ( and without running the rest of the code ). I hope someone can shed some light on this behavior. I also wonder if there's any other method to check if the SMB server has any filelock in place ( I obviously only want to parse complete files > perl should wait until file is complete ).

I properly commented it
$smbroot = shared folder by samba server
$sharedfolder = folder inside $smbroot

#!/usr/bin/perl

use strict;
use warnings;
use threads;
use threads::shared;
use Linux::Inotify2;

my @jobstack :shared ;   # shared array for filenames 

# spawn thread sub
# start inotify in separate thread
my $thr1 = threads->create(\&jobinotifydetect);  
$thr1->detach();

#main
while (1) {
sleep 1;
    {
    # here comes main > for now remove every second a file out of stac
+k 
    lock (@jobstack);
    # I can't splice shared arrays so I copy it onto a nonshared one >
+ later I need splice to pull out specific files 1st
    # now it's just a pop replacement
    my @temp=@jobstack;
    my $t = scalar (@temp);
    splice (@temp,$t-1,1) if ($t );
    @jobstack=@temp;
    }
}


sub jobinotifydetect {
my $smbroot = "/media/lrfp/";
my $sharedfolder = "jobs/";
my $path = $smbroot . $sharedfolder ;
my $jobinotify;
my $file;
my $count;
my $tt;
$jobinotify = new Linux::Inotify2 or die "unable to create new inotify
+ object: $!";
  # add watcher
   $jobinotify->watch ("$path", IN_MOVED_TO | IN_MODIFY | IN_CREATE, s
+ub {
    my $e = shift;
    if ( $e->IN_MOVED_TO || $e->IN_MODIFY || $e->IN_CREATE ) {
    # wait for file locks to be removed > when data is still written b
+y server
    my $file = $e->name;
    my $t=$e->fullname;
    my $exec = `smbstatus | grep "EXCLUSIVE" | grep "$sharedfolder" | 
+grep "$file"`;
    if (!($exec)) {
    # check if filesize <> 0,don't if file has zero length
    if ( -s $t ) {
    {
    # lock @jobstack while I push filename onto it 
    lock (@jobstack);
    # don't push onto stack if already exist
    if (!( grep( /^$file/, @jobstack ) ) ){
    push (@jobstack,$file);
    $tt=scalar (@jobstack);
    print "$count \t $file \t $tt \n";
    $count = 0;
    }}}
    } else {
    # count how much times routine gets called just out of curiosity, 
+I'll remove this later
    $count +=1 ;
    }
    }
    
});
1 while $jobinotify->poll;
}
[download]

when I copy 2 files from a single smb client I get a result ( as it should ) that is similar to this

1358     a.dd    1
458      b.dd    2
[download]

( last nr is amount of files in array ) but when I copy 2 files simultaneously from 2 different smb clients ( 2 different computers ) I'm getting something like this which is according to me not even possible as the array size doesn't increase while calling the print command ( there is only 1 print command that is called after a push ). Even stranger is that results are printed with a 1 sec interval after the 1st file is finished copying.

1266     a.dd    1
18       a.dd    1
43       a.dd    1
34       a.dd    1
56       a.dd    1
49       a.dd    1
34       a.dd    1
3        b.dd    2
0        b.dd    2
0        b.dd    2
0        b.dd    2
0        b.dd    2
0        b.dd    2
0        b.dd    2
0        b.dd    2
0        b.dd    2
0        b.dd    2
0        b.dd    2
0        b.dd    2
0        b.dd    2
0        b.dd    2
0        b.dd    2
0        b.dd    2
[download]

any help is greatly appreciated, an alternate method to check smb filelocks even more.

thanks in advance.
Jan

new code

#!/usr/bin/perl

use strict;
use warnings;
use threads;
use threads::shared;
use Linux::Inotify2;

my @jobstack :shared ;   # shared array for filenames 

# spawn thread sub
my $thr1 = threads->create(\&jobinotifydetect);  # start inotify in se
+perate thread
$thr1->detach();

#main
while (1) {
sleep 1;
    {
    # here comes main > for now remove every second a file out of stac
+k 
    lock (@jobstack);
    shift (@jobstack);
    }
}


sub jobinotifydetect {
my $smbroot = "/media/lrfp/";
my $sharedfolder = "jobs/";
my $path = $smbroot . $sharedfolder ;
my $jobinotify;
my $file;
$jobinotify = new Linux::Inotify2 or die "unable to create new inotify
+ object: $!";
   $jobinotify->watch ("$path", IN_MOVED_TO | IN_CLOSE_WRITE, sub {
    my $e = shift;
    my $file = $e->name;
    my $t=$e->fullname;
    #only do for files with non-zero size
    if ( -s $t ) {
    if ( $e->IN_MOVED_TO || $e->IN_CLOSE_WRITE ) {
    # don't push onto stack if already exist
    if (!( grep( /^$file/, @jobstack ) ) ){
    {
    lock (@jobstack);
    push (@jobstack,$file);
    }}}}
    # }
    
});
1 while $jobinotify->poll;
}
[download]

Comment on strange behaviour, would appreciate any comment / alternative method Select or Download Code

Replies are listed 'Best First'.
Re: strange behaviour, would appreciate any comment / alternative method by BrowserUk (Patriarch) on Jul 27, 2011 at 02:51 UTC
I don't have an answer to your main question (yet), but I can tell you that this: `my $t = scalar (@temp); splice (@temp,$t-1,1) if ($t ); @jobstack=@temp;` [download] Is a really slow, clumsy and silly way of doing this: `pop @jobstack.` [download] which is exactly equivalent and about 100 times faster (depending upon how mush data is in `@jobstack`. But the way you are using `@jobstack` suggests that you really ought to be using a Thread::Queue. Whether that would have any influence upon your main problem I don;t have the facilities to test, but it would definitely make it easier to reason about the possible causes by eliminating one possible source of errors. Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice.	[reply] [d/l] [select]
Re^2: strange behaviour, would appreciate any comment / alternative method by djamu (Initiate) on Jul 27, 2011 at 03:07 UTC
Thanks for your prompt reply about using splice, I know it's a silly pop replacement, but as mentioned in the added comment I later need splice to arbitrarily remove items out of the array not push or pop.	[reply]
Re^3: strange behaviour, would appreciate any comment / alternative method by BrowserUk (Patriarch) on Jul 27, 2011 at 03:39 UTC
I later need splice to arbitrarily remove items out of the array Hm. Three thoughts: Do it (that expensive clumsy copy-splice-copy operation) only when you need to.. Not when you only need to push/pop/shift/unshift. Copying a whole shared array to a non-shared array and then back again just so you can use splice is ... not good design. Removing elements from the middle of a structure called: `@jobstack` smacks of bad design or bad nomenclature. I rarely ever find myself needing to use splice. When I do find myself reaching for it, I always pause and look again at my design. 8 times out of 10 I find a better way. Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice.	[reply] [d/l]
Re^4: strange behaviour, would appreciate any comment / alternative method by djamu (Initiate) on Jul 27, 2011 at 04:14 UTC
Re^2: strange behaviour, would appreciate any comment / alternative method by djamu (Initiate) on Jul 27, 2011 at 03:16 UTC
thanks for your suggestion on Thread::Queue .. that might solve some other difficulties I was having on shared arrays, I'll test it and post the result ()	[reply]
Re^2: strange behaviour, would appreciate any comment / alternative method by Anonymous Monk on Jul 27, 2011 at 04:29 UTC
Hi, I used to have problems with picking up incomplete files. Now I always have the file loaded in with a temp extentension, and then have the down/uploader rename it. Renaming is usually fast. I only look for files with the renamed extension. J.C.	[reply]
Re: strange behaviour, would appreciate any comment / alternative method by Tanktalus (Canon) on Jul 27, 2011 at 05:17 UTC
A couple of things here make me nervous. The first is the ability to tell when a file is available. Now, perhaps your files will be incredibly small, so the act of copying them across the network and updating the directory structure will be basically atomic. I doubt it. So that means that a simple "-s" test is not sufficient to tell when a file is finished uploading. 1MB of 700MB may be uploaded so far, and then things can really go wonky. You'll probably want to adjust your protocol so that some atomic operation can be done by the uploader to indicate to the daemon that the file is ready. There are a few simple choices, and one more complicated choice, that I can think of here. The first one is to have the application that puts up the job file create a file with the same name, but adding ".done" to the end. This file would have no contents. But, because it is created after the main job file is done, then we know the main file is done. To do this, something like `{ open my $fh, '>', "$jobfile.done"; }` should do it (create the file, filehandle goes out of scope, it's closed). However, this still leaves a bit of a hole - what if the server deletes the file between the open and close? There's not much time there, but I don't know what would happen. Probably can be handled if you think about it. The second option is to upload the file to a different directory than the server parses them from. Once the copy is finished, a simple rename to the correct directory should be atomic. I can't think of a race condition here. The third option is to move the job status into a relational database. I'm not sure if your files can go there or not, but, if not, the metadata (i.e., "job 123 ready for pickup") can be inserted by the uploader, and the server stops monitoring the filesystem, instead monitoring the database (with a less-efficient regular poll). The downsides here are many, including setting up a db if you don't already have one. But the upsides include that the db should have transactions (thus the insert is always atomic if done properly), and locking. With the locking, you can lock tables/rows from the server side such that a second server could also poll the db, should you ever find the need for jobs to scale such that they are being run on multiple machines. The second thing I'm nervous about is simply the ability to lock with samba. This may work. It just makes me nervous. That's a lot of stuff that needs to go right - I wouldn't actually want to lock on NFS, either, so maybe I'm just paranoid. Locking in a db seems safer to me :-) (and option 2 above - renaming the file after it's finished - avoids this as well.) I'm not sure why you copy the jobstack to a temp, manipulate the temp, and then copy it back. If you can pull the item you want from the original shared @jobstack in a small lock, that'll be way better. `my $job = do { lock (@jobstack); extract_job(\@jobstack); # this will find the next one to do, and re +move it from the list, and return it };` [download] This saves a bunch of copying, and keeps the lock to a minimum. Locks are heavy-handed things. You want to avoid them whenever possible, and where not possible, you want to reduce their scope to a bare, bare minimum. Otherwise your other thread will block when file changes come in. Of course, in your sample code, you're not doing anything, so it's not yet a big deal, but I assume there is or will be more code in your main thread in your real code that does significantly more work, otherwise you wouldn't bother with all this :-)	[reply] [d/l] [select]
Re^2: strange behaviour, would appreciate any comment / alternative method by djamu (Initiate) on Jul 27, 2011 at 07:14 UTC
Hi Tanktalus, thanks for your comprehensive reply. ( Good pointers there. Seems you have a real good understanding of what I'm trying to do ) The snippet I provided does do a good job except for that weird "print" issue, which actually isn't one as I don't need it > I just noticed I get's called for no apparent reason -and it shouldn't / couldn't- on concurrent access but the fact that it does is weird. I searched CPAN for any smb server related module but couldn't find one ( plenty of client / auth ones though ) To get back at your suggestions, I'm actually already using option 2 + 3 ( this is the "different" directory ), the snippet checks ( the smbstatus is correct and definitive ) if a jobfile finished uploading, the "-s" test is merely there for slow / old / wan clients that "touch" a file before copying it ( and might very briefly release the lock > the -s test only for zero filesizes, ). A mini parser checks file type > moves it to a different directory and dumps it in a database available to all compute nodes > all further locking is handled by the database engine and the real parser(s) are run on the individual compute nodes.( FYI This is part of a new SSI framework that uses bittorrent to distribute itself to be released within a couple of weeks and should scale in the 100/1000's ) Option 1 is nearly impossible as I'm dealing with real people and 3rd party applications. ( 3D applications / simulation etc ... ) The reason for using smb is that all OS platforms support it ( and users without much network skills ) so no real network fs (NFS etc) and certainly not a distributed one like ocfs2 etc. I was just curious if I was missing something obvious... And use only 1 small lock in the main as you suggested. The temp thing was just a splice test, as I will need that later ( not meant as a shift pop replacement ) "....but I assume there is or will be more code in your main thread in your real code that does significantly more work, otherwise you wouldn't bother with all this.." lol yes a little bit ;-)	[reply]