heidi has asked for the wisdom of the Perl Monks concerning the following question:

Dear monks, Is there any oneliner in perl to compare 2 lists and print out the differences alone? actually, i want to copy files from one folder to another. folder A has 600,000 files. folder B has 400,000 files. I want to copy only the remaining 200,000 files from A to B. the unix cp command doesnt work, as it gives a "argument too long error". So, i thought i wil write a script for this action. Any suggestions plz? thank you.

Replies are listed 'Best First'.
Re: diff function
by ELISHEVA (Prior) on Apr 01, 2009 at 07:13 UTC

    You might want to take a look at the unix command rsync. If you don't have it on your system already and you are on Debian, run the command apt-get install rsync to download and install it. It has a ton of options, and can copy files both local to local and local to remote. It can be used either for synchronizing directories or for plain old copies.

    I'm assuming you are on *nix, but for readers who have a similar problem and are on MSWin, rsync is also part of the bundle of tools available with cygwin. I think cygwin users need to run setup.exe and select it - I don't think its in the base set of tools that is installed automatically. I think there are also pure MSWin versions as well - search the internet if you happen to be on MsWin.

    If rsync interests you, you might want to take a look at some of the Perl oriented rsync modules:

    • File::Rsync - Perl wrapper around the system command.

    • Rsync::Config - module for generating rsync configuration file. You don't need to use configuration files with rsync but it can help if you have a lot of non-standard option settings or are setting up a synchronization action that will run repeatedly.

    • File::RsyncP - a pure perl implementation of the client side of a local-to-remote/remote-to-local rsync.

    Best, beth

    Update: fixed link to man page.

Re: diff function
by ig (Vicar) on Apr 01, 2009 at 07:46 UTC

    Another option is find and cpio, as follows:

    find /source/directory -print | cpio -pdmv /destination/directory

    You can use arguments to find to select the files and directories you want to copy - by default it will copy the whole directory tree. By default cpio will not overwrite files with newer or the same modification time, so effectively it will only copy the new data. I often use '.' (current directory) as the directory in the find command.

Re: diff function
by Anonymous Monk on Apr 01, 2009 at 03:55 UTC
    So, i thought i wil write a script for this action. Any suggestions plz? thank you.
    Don't. Overcome "argument too long error" by giving unix cp command only two arguments, source file and destinationdirectory

      Use xargs(1) -- along with find(1), or anything else to generate file list -- to control the number of arguments given to cp(1) by using "replacement string" related option(s).

      There is also a program called unison which will sync up one directory to another, in both directions if you like, while keeping any existing files. It is currrently being maintained but lacks an active development.