comment on

I have some successful scripts that use threads, but they all have a set number of threads that they create. Now I am trying to create a script that will create X number of threads where the number of threads is determined by the number of search directories. In this particular case, each thread runs a "cleartool find" command on a directory to get an array of results back. For this example I am just using the unix find command. But the "cleartool find" command in ClearCase is similar, but takes a lot longer to run.

So what I am finding is that on small data it seems to work fine. I get consistent results. But on those really long running clearcase commands, I don't always get all the data I expect in the @Final array. There is probably a way to do this better... maybe locking the variable before I update it? I was thinking that each thread is updating a different hash key of the variable, so it should be safe to update this way? Or does it need to be locked before each join statement? Any suggestions on how to do this better?

#!/usr/local/bin/perl

use Cwd;
use threads;
use Data::Dumper;

my $use_cc = 0;
my @dirs = ();
if ( $use_cc ) {
  @dirs = split(/\s+/, $ENV{CLEARCASE_AVOBS});
} else {
  @dirs = qw(/bin /sbin /usr/local/bin /usr/sfw/bin /usr/bin);
}

# it's a clearcase thing
my $branch = "v4.0.0_gxp_patch";

# hash of dir names with thread values
my %threads = ();

# hash of dir names with arrays of found items
my %Found = ();

# large arry to hold all results
my @Final = ();

foreach my $dir ( sort @dirs ) {
  chomp($dir);
  # add dir name to hash
  $Found{$dir} = ();
  # create thread and add it to threads hash
  $threads{$dir} = threads->create({'context' => 'list'}, 'find_thread
+', $dir, $use_cc, $branch);
}
foreach my $dir ( sort keys %threads ) {
  # cycle through threads hash and join up results, put them in hash-o
+f-arrays
  @{ $Found{$dir} } = $threads{$dir}->join();
}
# still all the smaller hash-of-arrays into a large array for easier p
+rocessing later on
foreach my $dir ( sort keys %Found ) {
  foreach my $item ( sort @{ $Found{$dir} } ) {
    push(@Final, $item);
  }
}
print Dumper(@Final);
print "SIZE: " . scalar(@Final) . "\n";

sub find_thread {
  my $dir = shift;
  my $cc_flag = shift;
  my $branch = shift;
  my @results;
  chdir $dir or die "Cannot change to $dir\n";
  print "Finding all files in dir: $dir\n";
  if ( $cc_flag ) {
    @results = `cleartool find -all -version 'brtype($branch)' -print 
+2>&1`;
  } else {
    @results = `find $dir -print 2>&1`;
  }
  return @results;
}
[download]

In reply to creating unknown number of threads and then join results by rudds_perl_habit

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.