~~David~~ has asked for the wisdom of the Perl Monks concerning the following question:

I am designing an application that queries a database for image file locations, and then downloads the images to a local directory by FTP. Both of these two operations are somewhat time consuming due to the volume of queries / images. I was going to use threading to run these simultaneeouly. I was hoping for some suggestions about my plan, and if there is a better way, before I start to work on this in detail:

My initial thought was to have one subroutine that just pushes the results of the database query onto a shared array... and the FTP download will just shift off the image locations and download the files. When it runs out of files in the array, it will wait for a couple of seconds to determine if the query is complete.

Is this conceptually the right way to tackle this problem? Or am I heading down the wrong path entirely?

Thanks for any suggestions
~~David~~
  • Comment on Threading - Conceptual Question About Use Of Array

Replies are listed 'Best First'.
Re: Threading - Conceptual Question About Use Of Array
by pc88mxer (Vicar) on Jul 02, 2008 at 16:34 UTC
      Thanks!
      ~~David~~
Re: Threading - Conceptual Question About Use Of Array
by zentara (Cardinal) on Jul 02, 2008 at 20:48 UTC
    Using threads is only useful if you need to communicate between threads in real time. Otherwise fork-&-exec is a better solution. Why? Various drawbacks to threads, like possible memory gains over time, if a thread dies, it takes down the whole batch, and other inefficiencies. Reusing threads and the downloader packages will help in a long running threaded program. You might want to look at Reusable threads demo .

    A forked solution is good, because all memory is released back to the system when the forked process ends. See LWP::Parallel::UserAgent and Parallel::ForkManager. You can also search for scripts using these on groups.google.com.

    Anyways, before you think threads have solved your problem, watch it's memory usage as it runs. If it slowly gains, you have a problem with it.


    I'm not really a human, but I play one on earth CandyGram for Mongo
Re: Threading - Conceptual Question About Use Of Array
by jethro (Monsignor) on Jul 02, 2008 at 17:13 UTC
    It seems you don't need any communication from the ftp downloader back to the query/main program and more than one ftp download might get started simultaneously without any dire consequences (except maybe if it is the same file to download. If that's a problem, downloader.pl might use a lockfile or download to a randomly generated name and do a rename after download is complete). So the easiest solution is to start a separate program which does the ftp download and gets its data simply over the command line:

    while($query=getnewquery()) #database query ... $filename= ... # start the download system("downloader.pl $filename &") or print "failure to start downl +oader.pl\n"; }
    The '&' starts the downloader.pl program in the background so it is executed in parallel. Depending on the ftp download downloader.pl might be substituted with the actual ftp command.