Re: Searching a distributed filesystem

Once all this is established, i start itterating the input file (which can be millions and millions of lines), and sending the filenames over the ssh connection to this remote script, then it waits for a response. when it gets a response, it sends the next line

You are peforming a huge number of roundtrips of small packets over the network between collaborating processes and your processes over the network have to wait for each other's input. This is notoriously slow (just try it over a wide area network to get a feel for the impact of the roundtrips in general). Even while the network itself is fast the fact that your processes have to wait for roundtrip results will slow down the entire operation.

Therefore your algorithm may benefit greatly from transfer of larger chunks in one shot over the network. In particular I would suggest to copy the searchlist to the nodes (http/ftp-like protocol, Perl module LWP), then execute a fully local search (stdin from the local search list file, stdout to a local result file on the same node) and only afterwards transfer the result file back in one shot to the central system for report.

Comment on Re: Searching a distributed filesystem