blackadder has asked for the wisdom of the Perl Monks concerning the following question:

I have this main script below
use strict; use threads; use threads::shared; use Win32::FileSecurity; my @source: shared; my @childs; my $child; my @list; $list[0]=qw(c:\\winnt); $list[0]=qw(\\\\serverA\\g$\\Eng_Support\\Data_Removal); $list[1]=qw(\\\\serverA\\f$\\homedirs\\users0\\adb0290); for (@list) { push @childs, threads->create("size_up","$_");#line 16 #push @childs, threads->create("file_up","$_");#line 17 } foreach $child (@childs) { $child->join(); } printf "%s ", join ("\n", @source); sub size_up { my ($path) = @_; my $resp; $resp = `size_test.pl $path`; push (@source, "$path\t" . $resp); } sub file_up { my ($path) = @_; my $resp; $resp = `file_test.pl $path`; push (@source, "$path\t" . $resp); }
That calls on those two scripts;
#size_test.pl use strict; use Win32::OLE qw[in with]; my $fs = Win32::OLE->CreateObject('Scripting.FileSystemObject'); my $size = 0; my $d = $fs->GetFolder("$ARGV[0]"); eval { $size = $d->size( ); my $size_gb = ($size/1024/1024); print "Size: ".$size_gb."Mb ($size byte)"; };
And
#file_test.pl use Win32::OLE qw[in with]; my $fs = Win32::OLE->CreateObject('Scripting.FileSystemObject'); my $fCount =0; my $sCount =0; my @folders = $fs->GetFolder("$ARGV[0]" ); eval { while( @folders ) { my $folder = pop @folders; $fCount += $folder->Files->Count; $sCount += $folder->SubFolders->Count; for my $subFolder ( in $folder->SubFolders ) { $fCount += $subFolder->Files->Count; push @folders, $_ for in $subFolder->SubFolders ; $sCount += $subFolder->SubFolders->Count; } } print "Files: $fCount, folders: $sCount"; };
The main script will not work unless either lines 16 or 17 are commented out.

Does anyone knows why is this? And how can I get both lines working together (i.e. without having to comment either lines 16 or 17)?

Thanks,

Replies are listed 'Best First'.
Re: Threads problem.
by BrowserUk (Patriarch) on Oct 03, 2003 at 23:05 UTC

    First off, a little more than ...will not work unless... would be useful.

    Does this mean that the script

    1. Fails to compile?
    2. Compiles but dies with runtime errors?
    3. Runs and then traps?
    4. Runs to completion, but gives the wrong output?

    The probable cause of failure is that you are sharing @source between the threads but you aren't using lock() to serialise the access.

    I know you've been working on this for some time, but I don't think that using threads and shelling is likely to speed things up much. The overhead of shelling the sub-scripts will probably outweight any saving you might make by overlapping IO requests.

    IN your earlier post you said that it was still slower than "doing it manually"...What does this mean? How manually? And what order of difference are you trying to make up?

    I think I see a few things in your script that might be slowing things down, and I also think that you could use threads to improve things somewhat, but I don't have a network with which to test the ideas. An indication of what level of improvement you are seeking woudl make it easier for me to decide if they are worth pursuing.


    Examine what is said, not who speaks.
    "Efficiency is intelligent laziness." -David Dunham
    "When I'm working on a problem, I never think about beauty. I think only how to solve the problem. But when I have finished, if the solution is not beautiful, I know it is wrong." -Richard Buckminster Fuller
    If I understand your problem, I can solve it! Of course, the same can be said for you.

      None the above. And a massive apology if I wasn’t clear.

      In trying to find away to improve the speed of sizing the shares - On some servers I have it could take days – I was hoping to find away to at least fire few instances to speed things up. This is, rather than garbing one share at a time, and only move on to the next when the one in hand is done. Beside the size of the folder/share, I needed the number of files and folders also.

      This process took twice as long as doing the same operation manually (right clicking the folder and selecting all and the properties). So I thought threads might come in handy.
      I attempted to do some thing about it few weeks’ back but hit a brick wall, because of the nature of OLE that doesn’t allow re-entry and as many Monks here suggested, I abandoned the whole thing.

      Then, I had an idea, instead of having the OLE operations performed within the main script (no re-entry!), then by calling another scripts where the OLE operations can be performed, I can over come this limitation. It worked. I managed to get at least 7 folders to be processed concurrently – huge improvement in comparison, the time it took to do all seven shares was as long as only doing the largest one out of those seven folders.

      However, where the script failed is when I attempted to include the number of files and folder. Monk Particle in response to my other post suggested that I drop OLE and use File::Find::Rule, and thankfully a code was also provided.
      I bench marked both approaches and they both about the same speed, but I feel using file::find::rule is little bit more reliable, and contains less code.

      This about sums up my reasons for these attempts with multithreading techniques. I will try to thread more than one folder using file::find::rule to obtain folder stats, I hope this would improve things ,…even a little bit speed improvements would be great aid.

      You also mention that you see few things that might improve the speed my the above code, I would be very grateful for you comments, Kind Sir.

      Thanks

        Try this.

        #! perl -slw use strict; use threads qw[ yield ]; use threads::shared; use Thread::Queue; use Win32::OLE qw[ in with ]; my $Qwork = new Thread::Queue; my $Qresults = new Thread::Queue; sub worker { my $fso = Win32::OLE->new( 'Scripting.FileSystemObject' ); Win32::OLE->Option( Warn => 0 ); while( my $folder = $Qwork->dequeue ) { last if $folder eq 'DIE_NOW'; my @folders = $fso->GetFolder( $folder ); my $size = $folders[0]->size() || 'Permission denied'; my( $cFiles, $cFolders ) = (0) x 2; while( my $sub = pop @folders ) { $cFiles += $sub->Files->Count || 0; $cFolders += $sub->SubFolders->Count || 0; for my $subsub ( in $sub->SubFolders ) { $cFolders += $subsub->Files->Count || 0; push @folders, $_ for in $subsub->SubFolders; $cFolders += $subsub->SubFolders->Count || 0; } } $Qresults->enqueue( "$folder - size: $size Files: $cFiles Fold +ers: $cFolders" ); yield unless $Qwork->pending; } undef $fso; return 1; } my @threads = map{ threads->new( \&worker ) } 1 .. @ARGV; $Qwork->enqueue( @ARGV ); yield until $Qresults->pending(); for( 1 .. @threads ) { print $Qresults->dequeue; } $Qwork->enqueue( 'DIE_NOW' ) for 1 .. @threads; $_->join for @threads;

        Supply the directorys/shares to be sized on the command line and it will print out lines like

        P:\test>296404 M:\ S:\ C:\test C:\test - size: 22582263 Files: 7611 Folders: 352 M:\ - size: 47353164 Files: 995 Folders: 1421 S:\ - size: 99468262 Files: 1994 Folders: 3437

        Not very neat, but all the info is there.

        Although you cannot share an instance of Win32::OLE between threads, you can safely use a separate instance on each thread.

        You might also look up the Win32::OLE->Option() class method and the Warn => n setting. This allowed me to do away with the eval wrappers you had around the OLE stuff which probably wasn;t helping your performance. Though the cost of the eval itself was probably minimal, avoiding the calls to Carp/Caller/stack trace generation is worth having.

        What it does is spawns a separate thread for each share to be scanned, posts the targets on a the work queue and waits for the results to come back via the results queue and then prints them out. This allows the scans of the separate machines (and the associated IOWaits) to be overlapped.

        The process consumes around .7MB of ram for each path with a 4 MB startup cost, so you should be good for a hundred scans simulataneaously. If you have more than that, it is an easy change to queue the first 100 and then queue another as each completes until your done.

        I wouldn't normally advocate creating anything like this number of threads, but in this case, as each thread is essentially just sitting in a IO block waiting on other machines to respond, it makes sense to overlap as much of that waiting as possible.

        Let me know how you get on please...


        Examine what is said, not who speaks.
        "Efficiency is intelligent laziness." -David Dunham
        "Think for yourself!" - Abigail