To be honest, I'm not sure if this belongs in Meditations or in Seekers of Perl Wisdom. I opted for here because its more of some observations that I have recently made concerning the File::Copy module. I work with a system that automatically transfers files around for various business units. These transfers are primarily handled by a third-party system and mostly it handles transferring files between Unix/Solaris and Intel/W2K platforms. I work mostly with the Unix side of the equation and I set most everything up to run out of Perl. Recently, we were asked to look at the speed of transferring around 2Gigs+ of data to a separate Unix box. We are really only moving the files into a local directory that is NFS mounted on the remote box. We tried three separate methods, the first from the commandline doing a straight cp of the files into the destination directory, then I ran a Perl program that looped through and using the copy method from File::Copy, and lastly, using the third-party software to do the copy (out of the same Perl program). We timed the first method out at about 10 minutes to copy all the files, likewise, the third-party program took about the same amount of time. However, when I ran the Perl program using File::Copy, the time was closer to 4 minutes (give or take a few seconds here and there).

What really intrigues me is why copying the files with File::Copy is so much faster? I can assume that there is some latency introduced when I run the third-party transfer program out of a Perl program since I am forking out to the system to run it, but that might account for maybe a minute or so in total, even that is still quite a bit more than what File::Copy is doing. Anyone have any thoughts on this?


"Ex libris un peut de tout"

Replies are listed 'Best First'.
Re: Curious Observation with File::Copy;
by Corion (Patriarch) on Jun 27, 2003 at 13:21 UTC

    A quick look at the source tells me, that the most propable difference is the size used for reading and writing. File::Copy uses a buffer size of 2*1024*1024 bytes, while the programs written in C will most likely allocate a statically fixed buffer, whose size is not known.

    As it seems, if your numbers and testing are correct, your harddisk/network stack/NFS combination handles large blocks better than small blocks (of, say, 64k).

    perl -MHTTP::Daemon -MHTTP::Response -MLWP::Simple -e ' ; # The $d = new HTTP::Daemon and fork and getprint $d->url and exit;#spider ($c = $d->accept())->get_request(); $c->send_response( new #in the HTTP::Response(200,$_,$_,qq(Just another Perl hacker\n))); ' # web
Re: Curious Observation with File::Copy;
by Abigail-II (Bishop) on Jun 27, 2003 at 13:20 UTC
    Was your timing fair? Did you start all the programs with clear buffers on both sides? The best way to make sure there are no favourful buffers is to reboot both machines before doing each test. Furthermore, the network could have been the bottleneck - have to make sure you performed the test on a quiet network?

    I find it hard to believe cp would be 2.5 times slower than File::Copy.

    Abigail

      That's a good point. However, since this is a production server, I cannot reboot it. Although it was rebooted yesterday. I don't think the network is going to be much of an issue because the directories I'm copying between are local to the server on which I'm testing. I'm making a rough guess on the times, because I had the program that transferred the files record the start and stop times into a file. I think that there might be a +/- of 1-2 minutes in the times but we also sat and marked time with a stopwatch as well because we didn't believe that there could be that much discrepency in the times. I just ran the tests again this morning while the system was relatively quiet and there is not a lot going on on that server anyway.


      "Ex libris un peut de tout"
        You are doing performance testing on a production server? I find that mind boggling. Anyway, why don't you perform the tests in a test environment, or even your development environment? I'm not quite sure what you mean by the directories being local to the server - you were copying stuff from one machine to another using NFS, weren't you?

        Abigail

Re: Curious Observation with File::Copy;
by tilly (Archbishop) on Jun 27, 2003 at 13:55 UTC
    While you are benchmarking various methods, I would suggest also benchmarking rsync. If the files to be transferred do not change all at once, this should be much faster than copying them all of the time.
Re: Curious Observation with File::Copy;
by mojotoad (Monsignor) on Jun 27, 2003 at 15:57 UTC
    NFS has no error checking...if data integrity is crucial you might want to consider something like rsync, as tilly pointed out, or Net::FTP, scp, etc.

    Matt

    P.S. Why not just correct the word 'hear' rather than putting the correction in an update?

Re: Curious Observation with File::Copy;
by naChoZ (Curate) on Jun 27, 2003 at 20:48 UTC
    Another method worth a benchmark is tunneling tar through ssh.

    This will pull the contents of the specified remote directory into the current directory.

    ssh user@foo "cd /dir/to/copy ; tar cf - ." | tar xvfBp -
    I used to run multiple instances of this to get maximum throughput and it works really well. I can't remember exact numbers, but I copied around 60+GB with 10 instances in pretty short order. It performed substantially better than scp. I'm not exactly sure why, but I was figuring it had something to do with the fact that I was copying mailboxes. The 60+GB was made up of hundreds of thousands of individual files averaging a couple of K each.

    ~~
    naChoZ

      There's really not a lot of difference between that and using scp -r

        False. Otherwise I would've just used scp. scp was significantly slower. Besides, tar is quite a bit more flexible anyway.

        ~~
        naChoZ