Re: Performance of FTP Transfer
by dHarry (Abbot) on Jul 22, 2009 at 14:42 UTC
|
My Understanding is the performance of a script like this only depends on IO
Is the network not a bottleneck for you? (How is your local system "connected" to the server) I doubt very much if a threaded approach will improve the overall performance of ftp-ing a number of files. I recall there are tools around for finding out what the ftp bottlenecks are? | [reply] |
Re: Performance of FTP Transfer
by jrsimmon (Hermit) on Jul 22, 2009 at 14:42 UTC
|
That depends on the number and size of files which you are transferring. For example, if you have 10 files which require 10 minutes each to transfer, then 10 threads running simultaneously will indeed provide a performance boost. If, however, you have 10 files which take 1 second each to transfer, then the connection setup cost is likely greater than the transfer cost and multi-threading makes no sense.
There are some other considerations, such as socket and memory usage, if the number of files is large, though they are easily managed by limiting the number of threads.
| [reply] |
|
|
If your connection is saturated while transferring one file for 10 minutes, how would you transfer more by adding more threads? If your network is the bottleneck, adding more threads will not help. If your disk is the bottleneck, more threads will not help. If you are transferring from multiple remote hosts, and each one only can fill 1/3 of your network pipe, then 3 threads may help. If you try to add more traffic to an already saturated resource, the effort of managing that saturation will actually decrease your performance (see thrash (computer science) or ask an old networking person what happened to 10Mb ethernet that was operating at about 30% utilization).
--MidLifeXis
The tomes, scrolls etc are dusty because they reside in a dusty old house, not because they're unused. --hangon in this post
| [reply] |
|
|
If the network connection is indeed saturated, then there is little (though not nothing) to be gained by multi-threading the transfer of large files. It is not very common, though, for the network to be a bottleneck.
| [reply] |
|
|
Actually the file sizes are low , but there are a large no of files.
| [reply] |
|
|
In this case, because of the round trip times of the connection setup and teardown, you may get higher performance with more threads and the smaller files. One thread could be transferring a file while another is setting up or tearing down a connection. It all depends on what resources are your limiting factor.
Dan J. Bernstein (http://www.qmail.org) used this concept in building qmail. While not everyone agrees on his implementation of SMTP, the ability of that server to fully utilize the bandwidth of a host for small messages is hard to argue with.
Update: I am not saying that you will get better performance with multiple threads. It depends on a lot of different factors, both local and remote.
--MidLifeXis
The tomes, scrolls etc are dusty because they reside in a dusty old house, not because they're unused. --hangon in this post
| [reply] |
|
|
|
|
Your node got me wondering at what point forking more workers fails to improve performance. This is admittedly very environment-specific, but I found it interesting nonetheless. I put together the following simple script as a test:
Environment:
-
Client running on w2k3 and connecting to a HP-UX box
-
Directory to be transferred contains 340 .pdf files ranging in size from ~10k to ~250k.
You can see that the initial forks provided quite significant performance boosts. However, the improvement began to drop quickly once I was launching more than 7 children. Eventually there was no improvement whatsoever (though no cost, either).
Things to keep in mind:
- These are win32 forks. A *nix system may have very different results
- This script was written specifically for this testcase -- not meant to be very robust (but I did try to make it very readable)
| [reply] [d/l] [select] |