shijumic has asked for the wisdom of the Perl Monks concerning the following question:

I am rewriting an existing script that transfer files from a server to a local system using FTP transfer, which is threaded, using fork.

I need to rewrite it in a new framework. May I know whether multiprocessing really improves the performance on transfer. My Understanding is the performance of a script like this only depends on IO , so I am planning to write it as single threaded.

May I know what all are the factors affects the performance of a transfer script like this?

Replies are listed 'Best First'.
Re: Performance of FTP Transfer
by dHarry (Abbot) on Jul 22, 2009 at 14:42 UTC
    My Understanding is the performance of a script like this only depends on IO

    Is the network not a bottleneck for you? (How is your local system "connected" to the server) I doubt very much if a threaded approach will improve the overall performance of ftp-ing a number of files. I recall there are tools around for finding out what the ftp bottlenecks are?

Re: Performance of FTP Transfer
by jrsimmon (Hermit) on Jul 22, 2009 at 14:42 UTC

    That depends on the number and size of files which you are transferring. For example, if you have 10 files which require 10 minutes each to transfer, then 10 threads running simultaneously will indeed provide a performance boost. If, however, you have 10 files which take 1 second each to transfer, then the connection setup cost is likely greater than the transfer cost and multi-threading makes no sense.

    There are some other considerations, such as socket and memory usage, if the number of files is large, though they are easily managed by limiting the number of threads.

      If your connection is saturated while transferring one file for 10 minutes, how would you transfer more by adding more threads? If your network is the bottleneck, adding more threads will not help. If your disk is the bottleneck, more threads will not help. If you are transferring from multiple remote hosts, and each one only can fill 1/3 of your network pipe, then 3 threads may help.

      If you try to add more traffic to an already saturated resource, the effort of managing that saturation will actually decrease your performance (see thrash (computer science) or ask an old networking person what happened to 10Mb ethernet that was operating at about 30% utilization).

      --MidLifeXis

      The tomes, scrolls etc are dusty because they reside in a dusty old house, not because they're unused. --hangon in this post

        If the network connection is indeed saturated, then there is little (though not nothing) to be gained by multi-threading the transfer of large files. It is not very common, though, for the network to be a bottleneck.
      Actually the file sizes are low , but there are a large no of files.

        In this case, because of the round trip times of the connection setup and teardown, you may get higher performance with more threads and the smaller files. One thread could be transferring a file while another is setting up or tearing down a connection. It all depends on what resources are your limiting factor.

        Dan J. Bernstein (http://www.qmail.org) used this concept in building qmail. While not everyone agrees on his implementation of SMTP, the ability of that server to fully utilize the bandwidth of a host for small messages is hard to argue with.

        Update: I am not saying that you will get better performance with multiple threads. It depends on a lot of different factors, both local and remote.

        --MidLifeXis

        The tomes, scrolls etc are dusty because they reside in a dusty old house, not because they're unused. --hangon in this post

        Your node got me wondering at what point forking more workers fails to improve performance. This is admittedly very environment-specific, but I found it interesting nonetheless. I put together the following simple script as a test:

        Environment:

        • Client running on w2k3 and connecting to a HP-UX box
        • Directory to be transferred contains 340 .pdf files ranging in size from ~10k to ~250k.

        You can see that the initial forks provided quite significant performance boosts. However, the improvement began to drop quickly once I was launching more than 7 children. Eventually there was no improvement whatsoever (though no cost, either).

        Things to keep in mind:

        • These are win32 forks. A *nix system may have very different results
        • This script was written specifically for this testcase -- not meant to be very robust (but I did try to make it very readable)