temporal has asked for the wisdom of the Perl Monks concerning the following question:

Monks,

I have a script that pulls a large queue of files from one of my servers to another. These files range in size from a few hundred KB to up to a bit over 50 GB.

As my title suggests, I'm doing this copying using the File::Copy module's copy sub out to a network path (NFS). On a couple occasions this has failed on the larger files (30GB+). By failed I mean copy returned a 0. Unfortunately this was a quick and dirty script with minimal logging so I didn't get the actual error back from the copy sub. I have since added better logging, but have not reproduced the problem.

The script works using Parallel::ForkManager to fork out a bunch of processes to handle the individual downloads. Of course these larger files would be transferring longer than the others meaning they're on the network longer and more vulnerable to network disconnects/issues. I have checked for network connectivity issues in the logs and such with no luck, but it would've definitely been obvious to one of my admins if either of these servers was having problems with a network connection since they feed some maximum up-time apps.

This is on Windows systems (ActiveState), so I took a quick peek at File::Copy's source and it looks like it just links out to the OS specific file copy routines. So no real issue there. I'm using the default copy settings: not changing the buffer size, etc.

Anyway, I just thought I'd post and get your collective monkly opinion before shelving this and waiting for the logs to catch something.

  • Comment on File::Copy on Large-ish Files over Network

Replies are listed 'Best First'.
Re: File::Copy on Large-ish Files over Network
by BrowserUk (Patriarch) on Apr 23, 2012 at 14:46 UTC

    I'd suggest using robocopy for copying large files across unreliable networks.


    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.

    The start of some sanity?

Re: File::Copy on Large-ish Files over Network
by JavaFan (Canon) on Apr 23, 2012 at 14:20 UTC
    I would suggest using rsync. It doesn't copy the entire file if it can deduce largest parts are identical; it can use ssh underneath (which can do compression); and it can pipeline transfer of multiple files.
Re: File::Copy on Large-ish Files over Network
by flexvault (Monsignor) on Apr 23, 2012 at 14:50 UTC

    temporal,

    I too have had problems with 'File::Copy'. But for my situation, it was not with the size of the file, but rather with the stress on the system. I used 'File::Copy' on a mail server, and never seemed to have a problem when the server was getting less than 8 emails per second, but when the server went over 10 emails per second, I would get mis-matched "qf..." and "df..." files. So we were losing files!

    My solution was to use Perl to move or copy files of less than 2GB, and use the system 'move/copy' for files larger than 2GB. In my environment, the cost of using 'system' or 'qx' for files larger than 2GB was actually faster.

    For your situation have you looked at 'rsync'. The advantage is that if the sizes don't match you can restart 'rsync' until it is correct. And without ever going into the actual code, 'rsync' does seem much faster than 'move/copy/ftp' over networks. I have used 'rsync' on Windows, but I don't know if you have access to it in your environment.

    Good Luck

    "Well done is better than well said." - Benjamin Franklin

Re: File::Copy on Large-ish Files over Network
by temporal (Pilgrim) on Apr 23, 2012 at 16:38 UTC

    Thanks for the recommendations, guys.

    JavaFan and flexvault, unfortunately I don't have access to rsync on these servers. I do use it religiously on my linux boxes, though. Definitely a great tool.

    BrowserUk thanks for the tip! I was not aware that Windows had a built-in utility like robocopy. I'll give it a shot.