Smile-n-Nod has asked for the wisdom of the Perl Monks concerning the following question:

I have a Perl program with (typically) 32 threads, each of which uses a command-line utility (via a system() call) to download large files from one or more servers. These threads also write periodically to STDOUT, which I've redirected to a (log) file on my local drive.

When my program downloads these files onto my local hard-drive, everything works fine. However, when my program downloads the files and saves them onto a data-drive that is physically located on another machine (across a network), everything bogs down and becomes deadlocked (I think). If I use fewer threads (say, 4 or 8 on my 4-processor machine), my program usually works fine.

Obviously I'm overloading the network, but are there any strategies that I can use to minimize the problems I'm having? Thanks.

  • Comment on Using multiple threads that save files across a network

Replies are listed 'Best First'.
Re: Using multiply threads that save files across a network
by moritz (Cardinal) on Jun 03, 2011 at 20:58 UTC
    Obviously I'm overloading the network, but are there any strategies that I can use to minimize the problems I'm having?

    Sure. Get a faster network connection to the network file system server. Or don't store the files on a network file system. Or use fewer threads concurrently. Or configure your network stack to prioritize connections to the file server, and throttle the other connections.

    Or what kinds of strategies did you mean?

Re: Using multiply threads that save files across a network
by zek152 (Pilgrim) on Jun 03, 2011 at 21:02 UTC

    First of all I suspect that the issue is not with perl but with the system call you are using. You did not tell us what utility you are using so all further advice will be speculation.

    My guess is that that there are n connections being made with the destination machine (n being number of threads) and that the utility is not meant to be used in multiple threads to connect to the same machine. This could result in timeouts.

    Possible Fix #1:Possibly set up a scheme where you are caching the files locally and only making one connection with the destination machine. (if you can reliably make more connections then send 2 files at once). The idea being that you can keep downloading the files in multiple threads and then transfer them in fewer.

    Possible Fix #2: Eliminate the middle man. Have the destination machine grab the large files itself.

Re: Using multiple threads that save files across a network
by BrowserUk (Patriarch) on Jun 03, 2011 at 21:25 UTC
    to download large files from one or more servers.

    Are the source servers on the same LAN? Or from the internet or a WAN?

    Obviously I'm overloading the network,

    There is nothing you can do in Perl to influence this problem. Possibilities:

    1. Run fewer threads and accept it will take as long as it takes to ship the total volume of data across the LAN to its final destination.
    2. Run the script at the destination server.
    3. Move the storage to the local machine.
    4. Put second LAN cards in the machine running the script and the destination machine and set up a dedicated connection between them.

    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
Re: Using multiple threads that save files across a network
by Anonymous Monk on Jun 03, 2011 at 23:07 UTC
    I suspect that you will find that one thread, or maybe two, will accomplish more work in less wall-time than 32.