Copying a large file (6Gigs) across the network and deleting it at source location

skyler has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Copying a large file (6Gigs) across the network and deleting it at source location by Abigail-II (Bishop) on Feb 04, 2004 at 23:18 UTC
Copying a file of 6Gb means you have to write 6Gb of data. That's going to take a long time. Instead of copying and removing, people tend to 'move' a file instead. That's fast when it's on the same filesystem, and on most modern OSses, it falls back to copy and delete if the data has to moved to a different filesystem. But I don't really understand your question. You can't really speed up the process - at least not by using different statements in your program (you might be able to tune your OS that copying huge files goes faster). I don't know why you are considering a timer, and I've no idea what you mean by "copying until EOF to delete the file once it finished copying". I would do the thing you're doing from the command line, and skip the Perl part: `find M:/Directory -name '*.BAK' \ -exec mv {} 'I:/(Directory0)/(Directory1)/(Directory2)/{}' \;` [download] Abigail	[reply] [d/l]
Re: Copying a large file (6Gigs) across the network and deleting it at source location by allolex (Curate) on Feb 04, 2004 at 23:15 UTC
Sorry for stating this so flatly, but you should really do a checksum on the original and compare it to your copy before `unlink()`ing the orginal. Have a look at Digest::MD5, which seems to be very popular. You could also call the *NIX command 'cksum' (which probably has been ported to Windows, or at least has an equivalent) and get very similar results. Also, when handling errors, try 'die' instead of 'print', so Perl will return the right error level: `unlink( "$_" ) or die "Couldn't delete file: $_\n";` If you add an 'or die' to the copy operation, it will only attempt to delete the orginal if the copy is successful. -- Allolex	[reply] [d/l]
Re: Re: Copying a large file (6Gigs) across the network and deleting it at source location by iburrell (Chaplain) on Feb 05, 2004 at 02:41 UTC
Doing a checksum will effectively double the transfer time because the files need to be read back from the remote location. Especially since a network filesystem copy is fairly reliable since the OS does some error recovery.	[reply]
Re: Re: Re: Copying a large file (6Gigs) across the network and deleting it at source location by allolex (Curate) on Feb 05, 2004 at 08:15 UTC
Well, yes. That's a good point. But no one said the checksum has to be done from the remote system. ;) -- Allolex	[reply]
Re: Copying a large file (6Gigs) across the network and deleting it at source location by Roger (Parson) on Feb 05, 2004 at 00:40 UTC
Just an idea, how about using secure copy to copy files across the network recursively instead? `scp -CBvrp srcpath user@host:destpath \|\|\|\|\| \|\|\|\|+-- Preserves time stamp \|\|\|+--- Recursively copy entire directories. \|\|+---- Verbose mode useful for logging \|+----- Batch mode +------ Enable compression accross network and then... del /S *.BAK # recursively delete BAK files` [download]	[reply] [d/l]
Re: Re: Copying a large file (6Gigs) across the network and deleting it at source location by waswas-fng (Curate) on Feb 05, 2004 at 03:13 UTC
This assumes that the extra CPU juice needed to encrypt/decrypt and compress/decompress 6gb will allow the transfer to happen at the same rate or faster. It may or may not, but I assume if his network can't transfer raw 6gb faster than he has stated, the CPU's (and memory) plugged into such a network may be an even tighter bottleneck. -Waswas	[reply]
Re: Copying a large file (6Gigs) across the network and deleting it at source location by dws (Chancellor) on Feb 05, 2004 at 05:10 UTC
It takes about one hour and half to copy across the network. You don't mention how far apart the source and destination are, or whether they're attended by people (as opposed to running in a dark colo somewhere). If the servers are close, there's an option we often forget: use removable drives. It take considerably less than an hour and a half to copy 6Gb of data at IDE (or SCSI) speeds, remove the drive, and walk it across the room to the backup box. Or, if the source and destination are far apart and "latency" isn't critical, shipping a removable drive via FedEx can still yield reasonable bandwidth. It might be an option to consider.	[reply]
Re: Copying a large file (6Gigs) across the network and deleting it at source location by ctilmes (Vicar) on Feb 05, 2004 at 11:59 UTC
rsync has a "--delete-after" option to delete the file after the copy. Depending on the nature of your file (does every single byte change every time you need to copy it?) rsync can also improve efficiency. It can also use ssh compression similar to the scp already mentioned if that helps your transfer. (Or it could be much less efficient...in which case don't use it.)	[reply]
Re: Copying a large file (6Gigs) across the network and deleting it at source location by zentara (Cardinal) on Feb 05, 2004 at 15:32 UTC
On a file that big, I would be tempted to split the file on the remote machine into say 60 pieces of 100 meg each(or even 600 10meg files). Take md5sums on all the pieces, and send the list to your local machine. Then download them 1 at a time(or even a few in parallel if your bandwidth permits), and as they arrive, if their md5sum match, then delete that cut portion off of the remote machine. After all the files have arrived and are verified, cat them back together. Do alot of testing of this method first. :-) But it would give you some protection against one of the network connection hanging, causing loss of a partial file. It may also speed up your transfer, with parallel file transfers.	[reply]
Re: Copying a large file (6Gigs) across the network and deleting it at source location by rchiav (Deacon) on Feb 05, 2004 at 15:51 UTC
robocopy is well suited for this. I've used it for a lot of large (read: 20-50 gig) data migrations and it's worked fairly well. It has a switch to retry on errors, and it can recover from network glitches. You can also copy security info. It's in the NT resource kit, and I believe it comes with XP.	[reply]