in reply to Re^3: perl -Dusethreads compilation
in thread perl -Dusethreads compilation

# create n numbers of threads #create ftp connection/login while (1) { wait msg Q(); 1. dequeue element 2. download file from ftp server. 3. delete file. } # Master Thread #create ftp connection/login while (1) { if 'files exist on ftp server' { enqueue files to download } }

To sum it, master threads gets list to download files and workers just downloads the file.

I am worried about multiple threads colliding, race condtions, in NET::FTP module.

Replies are listed 'Best First'.
Re^5: perl -Dusethreads compilation
by Corion (Patriarch) on Apr 05, 2010 at 15:32 UTC

    So you don't really want a semaphore (people rarely need that), you just want a Thread::Queue for example, to distribute the jobs among your worker threads. Alternatively, if threads are unavailable to you, see Parallel::ForkManager, as your child/worker threads don't really need to communicate with the master thread.

    Also, you might think about whether multiple threads will really improve the runtime of a process that is basically limited by the available bandwith and not by the available computation power.

      you might think about whether multiple threads will really improve the runtime...

      In case the bandwidth bottleneck in on the servers side, or the servers are responding slowly (think of sites like this one), parallelizing the downloads could certainly help.

      (update: rephrased to use plural "servers" to make it clearer what I meant)

        I'm not sure I understand this. If the server is already pumping out 100% of its upstream bandwidth, how will adding more threads to hammer the server improve throughput? You might get a bigger share of the total bandwidth (and a ban), if the server allocates all connections equally.

Re^5: perl -Dusethreads compilation
by BrowserUk (Patriarch) on Apr 05, 2010 at 16:01 UTC
    I am worried about multiple threads colliding, race condtions, in NET::FTP module.

    Just create a new instance of Net:FTP in each thread. There should(untested) be no conflicts.

    This untested pseudo-code might get you started:

    use threads use Thread::Queue use Net::FTP my $Q = new Thread::Queue; sub downloader { my $ftp = Net::FTP->new("some.host.name", Debug => 0) or die "Cannot connect to some.host.name: $@"; $ftp->login("anonymous",'-anonymous@') or die "Cannot login ", $ftp->message; $ftp->cwd("/pub") or die "Cannot change working directory ", $ftp->message; while( my $file = $Q->dequeue ) { $ftp->get( $file ); $ftp->delete( $file ); } $ftp->quit } my @threads = map threads->create( \&downloader ), 1 .. $N; my $ftp = Net::FTP->new("some.host.name", Debug => 0) or die "Cannot connect to some.host.name: $@"; $ftp->login("anonymous",'-anonymous@') or die "Cannot login ", $ftp->message; $ftp->cwd("/pub") or die "Cannot change working directory ", $ftp->message; $Q->enqueue( $ftp->ls ) or die $!; $ftp->quit; $Q->enqueue( (undef) x $N ); $_->join for @threads;

    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
      I do have separate instance/connection. But my question is should I still be concerned about any clashing happening? Oh and this is the same server downloading.

        Try it. See what happens.