in reply to Searching a distributed filesystem

Here are a couple of things that may speed up your process.

  1. Make your remote script more intelligent.

    Don't glob for every filename. This is the equivalent of looking things up in an array, except slower because you're going out to the filesystem each time. Use a hash.

    glob once with a full wildcard and put results into a hash. Then each time the remote script recieves a filename to lookup, it does it using a O(1) memory lookup rather than a O(n) filesystem hit.

  2. Don't tie the filelist in every thread.

    I see no advantage to using Tie::File over simply opening the file for input in each thread and reading the filenames one line at a time.

  3. Update: If you need better performance, you could get into opening multipe sessions to each server.

    The messy bit is synchronising the accesses to the file list. If you want to go that route, and have difficulty on seeing how to synchronise them, come back.

I don't for the life of me understand your 'clarity' argument for 24 (128!) lines versus 8(16)

my @threads = map{ threads->new( \&process3, 0, $endline, $_ } qw[ c001n05 c001n06 c001n07 c001n08 c001n09 c001n10 c001n11 c001n12 c001n13 c001n14 c001n15 c001n16 ]; $_->join for @threads;

Clearer to read and much easier to maintain when the server list changes.


Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
"Too many [] have been sedated by an oppressive environment of political correctness and risk aversion."