comment on

Hello, im working on a large distributed file system environment, that is, any given file can exist on any given "node" in the system. Each node is interconnected on an internal network. The task im trying to acomplish is the following: a user supplies a "file list", that is, a file with a list of newline seperated filenames. I need to search every node in the system (sometimes upward of 64 nodes), for each file in the list. The approach im using now is: For each "node", i create a thread, then i fork a process, exec ssh and run 'perl' with it,I dup the stdin/stdout of the ssh process to the perl script. Next, i send what i call a "remote perl script" over the ssh connection, followed by "__END__\n". the remote script looks like the following:

sub remote_script {
    my ($mode) = @_;
    
    if ($mode eq "test") {
        return '
            $|=1;
            print "READY\n";
            while (<STDIN>) {
                   chomp;
                   my $found = glob($_);
                   print "$found\n";
            }
        ';
    }
}
[download]

Once all this is established, i start itterating the input file (which can be millions and millions of lines), and sending the filenames over the ssh connection to this remote script, then it waits for a response. when it gets a response, it sends the next line. the semi-full code looks like this:


sub process3 {

        my ($start,$end,$node) = @_;

        my %workers;
        my ($cur, $line, $pos);
        my $done = 0;
        my ($rnode, $obj);


        my @file;
        tie @file, 'Tie::File', "inputfile" or die "couldnt tie";

                $workers{$node} = open_handle($node, "glob"); #this op
+ens the ssh connection.
                my $to_node = $workers{$node}[WRITE];
                my $from_node = $workers{$node}[READ];
                $workers{$node}[SENT] = $start;
                $line = $file[$workers{$node}[SENT]];
                print $to_node "$cur\n";
                $workers{$node}[SENT]++;

         while(1){
                        my $res;
                        $res = $from_node->getline();

                        chomp($res);

                        ($obj, $rnode) = split(',',$res);
                        print "$obj\n" if $res;
                        last if ($workers{$node}[SENT] > $end);

                        $line = $file[$workers{$node}[SENT]];

                        print $to_node  "$line\n" unless($workers{$nod
+e}[SENT] > $end );
                        $workers{$node}[SENT]++;
        }


}

#i know i can put this in a loop, but i decided to leave it for clarit
+y.
my $thr1 = threads->new(\&process3, 0,$endline, "c001n05");
my $thr2 = threads->new(\&process3, 0,$endline, "c001n06" );
my $thr3 = threads->new(\&process3, 0,$endline, "c001n07" );
my $thr4 = threads->new(\&process3, 0,$endline, "c001n08" );
my $thr5 = threads->new(\&process3, 0,$endline, "c001n09" );
my $thr6 = threads->new(\&process3, 0,$endline, "c001n10" );
my $thr7 = threads->new(\&process3, 0,$endline, "c001n11" );
my $thr8 = threads->new(\&process3, 0,$endline, "c001n12" );
my $thr9 = threads->new(\&process3, 0,$endline, "c001n13" );
my $thr10 = threads->new(\&process3, 0,$endline, "c001n14" );
my $thr11 = threads->new(\&process3, 0,$endline, "c001n15" );
my $thr12 = threads->new(\&process3, 0,$endline, "c001n16" );
$thr1->join();
$thr2->join();
$thr3->join();
$thr4->join();
$thr5->join();
$thr6->join();
$thr7->join();
$thr8->join();
$thr9->join();
$thr10->join();
$thr11->join();
$thr12->join();
[download]

The problem is that this gets really slow when there is a large inputfile, taking up to 45 min to search for 1,000,000 lines. Id like to see this improve, even a little bit. If anyone has any advice for this, please share. Thank you!

In reply to Searching a distributed filesystem by LostShootingStar

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.