mantra2006 has asked for the wisdom of the Perl Monks concerning the following question:

Hello Monks
I am using a per script to archive files in a foler if the files matches
certain criteria...that means if the files are seven days
old then it will be archived. I am using the following
command to archive the files and later compress it
my $ret = system("tar -cvf $tar_file @srcfiles"); if ( -e $tar_file){ system("compress -v $tar_file"); } @srcfiles is the array of filenames which matches the criteria
The problem I am facing is if the folder has 300000 files then the program
is slow and tar process is taking more than
half day to complete..is there a best way to tackle this problem...thanks in advance for your suggestions
Thanks & Regards
Sridhar

Replies are listed 'Best First'.
Re: Archiving Files
by Joost (Canon) on Dec 11, 2006 at 19:51 UTC
    You seem to be confusing files and directories. At least none of the systems I've worked on support 300000 arguments to a single command:
    perl -e '@files = (0 .. 300_000); system("echo @files") and die $!' Argument list too long at -e line 1.
    Could it be @srcfiles actually contains directories? And maybe contains them more than once? (note: tar works recursively). In any case, half a day might be slow or it might be fast, depending on the size of the files. Also, if you've got large tarballs piping the output of tar directly into gzip or compress instead of using a temporary file might be quicker:
    my @escaped = map { quotemeta } @directories; system("tar cv @escaped|gzip >tarfile.tar.gz") and die $!;
    Note that you should also be careful about spaces and meta characters in @directories, hence the quotemeta.
Re: Archiving Files
by swampyankee (Parson) on Dec 11, 2006 at 20:05 UTC

    I'm not sure that this is a Perl problem; tarring and compressing a large number of files -- especially large files -- may well take a long time, regardless of how clever the Perl code is.. How long do the tar and compress operations take to complete? You may get better performance if you can use one of the tar-related modules (see, for example, Archive::Tar), or by springing for a commercial backup utility.

    emc

    At that time [1909] the chief engineer was almost always the chief test pilot as well. That had the fortunate result of eliminating poor engineering early in aviation.

    —Igor Sikorsky, reported in AOPA Pilot magazine February 2003.
Re: Archiving Files
by johngg (Canon) on Dec 11, 2006 at 20:17 UTC
    the program is slow and tar process is taking more than half day to complete
    Have you determined which part of your program is slow, the actual writing of the tar archive or finding the files first? How do you populate your @srcfiles array? In a directory with 300,000 files it could be that just finding the files over a week old is taking a long time.

    I know that this may not be possible but could the system that writes these files be easily altered to write the files into sub-directories? These could be named for a time period, maybe for a day or even for an hour if the file volumes are huge, e.g /dest/dir/2006-12-11 or /dest/dir/2006-12-11.09. Doing this would make administering these files much easier.

    I hope these thoughts are of use.

    Cheers,

    JohnGG

Re: Archiving Files
by graff (Chancellor) on Dec 12, 2006 at 02:04 UTC
    1. Does your version of tar support the "-z / --gzip" (compress) option? If so, you could create the ".tar.gz" file in a single step.
    2. Does your version of tar support the "-T / --files-from" option? If so, it would be better to store the items in your @srcfiles array to a simple list file (one file name per line), and pass the name of the list file to the tar command with "-T".
    3. Do you really need to use the "-v / --verbose" option when you run tar? That just causes tar to print the names of all the included files to its stdout, but you already have the list of file names, so you don't need this.
    Putting all those together, your run time might be shorter with the following:
    open( L, ">$tar_file.input_list" ); print L "$_\n" for ( @srcfiles ); close L; system( "tar czf $tar_file.gz -T $tar_file.input_list" );
    As suggested above, if your version does not support the "z" option for creating a gzipped tar file, you can simply pipe tar's output to a separate gzip command:
    system( "tar cf - -T $tar_file.input_list | gzip > $tar_file.gz" );
    (I don't think I've ever seen a version of tar that does not support the "-T listfile" option.)

    Apart from that, if you are dealing with lots of data, it's going to take time. Using tar and gzip from the shell command line should give you similar results, and perl has nothing to do with it -- unless, as mentioned in another reply, your list of @srcfiles contains a lot of duplicated entries.

Re: Archiving Files
by shmem (Chancellor) on Dec 12, 2006 at 17:45 UTC

    300_000 files is too long an argument list. If all the 300_000 files live within one folder, consider tarring the folder, not the files:

    chdir $folder or die "Can't chdir to '$folder': $!\n"; my $ret = system("tar -cvzf $tar_file ."); # tar and zip on the fly and die "Couldn't create tarfile '$tar_file': $!\n"; # 'and', not +'or'

    If the files live in different folders (e.g. the list was constructed with File::Find or similar) you could split the list into e.g. 1024 item chunks, create the tar file with the first chunk, then update the tar file with the u flag to tar:

    my @ary = splice(@srcfiles,0,1024); my $ret = system("tar -cvf $tar_file @ary") # tar only, no update on ' +z' and die "Couldn't create tarfile '$tar_file': $!\n"; while(@ary = splice(@srcfiles,0,1024)) { $ret = system("tar -uvf $tar_file @ary"); and die "Couldn't update '$tar_file': $!\n"; }

    Note also that on some systems performance sinks drastically with big directories (over 100_000 files) while reading file attributes with the stat(2) or lstat(2) system calls (which tar must do to store them), so you would be better off breaking your big folder into smaller ones.

    --shmem

    _($_=" "x(1<<5)."?\n".q·/)Oo.  G°\        /
                                  /\_¯/(q    /
    ----------------------------  \__(m.====·.(_("always off the crowd"))."·
    ");sub _{s./.($e="'Itrs `mnsgdq Gdbj O`qkdq")=~y/"-y/#-z/;$e.e && print}