in reply to Copy Files from network

Need to copy files from network

"network"? LAN? WAN? The internet?

Are the 5 destinations local or remote?

Chosing a good solution very much depends on the type of network involved; the size of the files involved; whether those files will always be different or not?

For many cases, using system standard system utilities or obtaining an additional utility will be a better--faster, simpler--solution than writing one in Perl.

I have a code that ...

You might get a better response if you showed us that "code". Is it Perl?

If this has to be done in Perl there are several posibilities for making best use of your cpu and bandwidth.

  1. Copy each file once, and write it 5 times.

    If the source is remote and the destinations local, then you can probably copy the first (local) copy of each file 4 times to the other locations, in the same time as it takes for the next remote file to download.

  2. Depending where in the chain the bottlenecks are, it might be advantageous to overlap the fetching of 2 or more remote files.

    This could be done with asynchronous system commands, forks or threads.


Examine what is said, not who speaks.
"Efficiency is intelligent laziness." -David Dunham
"Think for yourself!" - Abigail        "Time is a poor substitute for thought"--theorbtwo
"Memory, processor, disk in that order on the hardware side. Algorithm, algorithm, algorithm on the code side." - tachyon

Replies are listed 'Best First'.
Re^2: Copy Files from network
by gpurusho (Acolyte) on Nov 16, 2004 at 23:11 UTC
    Network: LAN

    I am using a Perl script for this

    I cannot show the entire flow of the code as its part of huge script. But here is the logic.
    1. read file names ( file names always change) from a file into an array
    2. copy from a LAN folder to local folder
    Currently I use the system opy command to copy files
    3. Copy from Local folder into 5 other LAN folders.

    I do not know much about forks / threads and how this works. I feel if I can copy more than 1 file at a time it will greatly reduce the script execution time.

      I suggest you take a look at Parallel::ForkManager. The second example uses LWP, where you just need to use

      system qq[copy \\\\srcserver\\path\\to\\source.fil \\\\dstserver\\path +\\to\destination.fil];

      Examine what is said, not who speaks.
      "Efficiency is intelligent laziness." -David Dunham
      "Think for yourself!" - Abigail        "Time is a poor substitute for thought"--theorbtwo
      "Memory, processor, disk in that order on the hardware side. Algorithm, algorithm, algorithm on the code side." - tachyon
Re^2: Copy Files from network
by Luca Benini (Scribe) on Nov 17, 2004 at 14:01 UTC
    What is your phisical limit? Good code can't improve your bandwith, as i believe the problem is in bandwidth there are mainly 2 ways: 1) Using compressend data for trasmission; 2) Using a p2p structure: Let S be the main server and A,B,C,D,E the destination: 1) S send to A 2) A send to C 2) S send to B 3) C send to D 3) B send to E Consider also rsync....
      Based on my experiences with various kinds of I/O, including copying files over networks, I am strongly inclined to believe that there is considerable latency to take advantage of. (There is considerable latency just copying data around on disk!) So some naive parallelism would be a considerable win.

      Try it. Benchmark it with different numbers of copying programs. Set the actual number of copies that you'll use to a bit below what gives you the best throughput (so that you don't be too much of a hog if other people try to use the network).

      Of course if you can use rsync rather than a homegrown solution, by all means do!

      Good code can't improve your bandwith,...

      I sit on the end of a 56k connection. Most of the time my best connect speed is around 44k. If I'm downloading large files from a remote server, using a threaded downloader with 3 or 4 threads, I will often (appear to?) get greater throughput than with a single threaded but very efficient downloader like wget.

      Despite that the same pipe, same connection speed, same file from the same server, even the same session. If I let wget run for a while to get it's throughput measurement, interrupt it and switch to a threaded downloader, the threaded downloader sometimes shows a few % higher throughput.

      Now it could be just that the two calculate their throughput in different ways but I do not think that is the case as both are consistant with the Task Manager Networking bandwidth monitor. That is to say, during the same connection, a download with one, immediately followed by a download with the other, and the system throughput monitor also shows the threaded downloader to get greater throughput than wget.

      My supposition is that with the narrow pipe, most servers are capable of driving the pipe as fast as it can go, therefore there is always data available to be read by the app locally. However, sometimes the app has to send acks, write to disk, update the progress bar when it could be reading the next packet.

      In the multi-thread app., there is always another thread available to read it's next packet, when the the first thread is waiting for a disk write to complete or is updating the screen. In the single-threaded wget, these other events detract from the throughput.

      That's my theory--the theory which is mine. Dinosaurs are thin at one end, thick in the middle, and thin again at the other end (Wrong show!)


      Examine what is said, not who speaks.
      "Efficiency is intelligent laziness." -David Dunham
      "Think for yourself!" - Abigail        "Time is a poor substitute for thought"--theorbtwo
      "Memory, processor, disk in that order on the hardware side. Algorithm, algorithm, algorithm on the code side." - tachyon