gri6507 has asked for the wisdom of the Perl Monks concerning the following question:

Fellow monks,

To my shocking surprise, two days ago I found out that the the infamous Linux DrakTools backup utility was not actually backing everything up. I had set it up to do the backup of all of my data, and, even though the report said every file was backed up, the resulting tar.gz ball did not contain the file that I wanted to restore. (There was MANY such files and entire directories missing). This gave me great concern, so I search around for a comparable backup utility, but didn't find one to my linking. So, since this type of a job is perfect for perl, I adapted a few scripts I found here to be more to my liking. My results are seen below.

Unfortunately, I have two problems:
1. Aparently, Archive::Tar keeps the tar image in memory, which causes my script to die with "Out of memory!" error when my 1G of RAM and 2G of swap get completely filled. I searched, but could not find a way to flush the in-memory archive to the disk. Did I miss something? Is there a better way to create a tar image without using all that memory?
2. I wanted to find out why I was getting the "Out of memory!" error message, so I thought that

eval { Carp::confess("init") }; $SIG{__DIE__ } = sub { Carp::confess };

should do the equivalent of a stack trace dump before exiting. Apparently it is not doing that. What am I missing?

Thanks in advance,
Fellow monk

=head1 NAME backup.pl - Yet Another Script for making backups =head1 SYNOPSIS backup.pl --bakdir=s --backup=s [--ignorefile=s] [--debug=i] Options: --bakdir - where to look for and store backup files --backup - what directory to backup --ignorefile - file which lists what subdirs not to backup --debug - print debug information =cut use strict; use warnings; use English; use Getopt::Long; use Pod::Usage; use Archive::Tar; use File::Path; use File::Find; use File::Glob ':glob'; use Carp ( ); eval { Carp::confess("init") }; $SIG{__DIE__ } = sub { Carp::confess }; my $debug = 0; my $bakdir = ''; my $backup = ''; my $ignorefile = ''; GetOptions( 'debug=i' => \$debug, 'bakdir=s' => \$bakdir, 'backup=s' => \$backup, 'ignorefile=s' => \$ignorefile, ); eval {mkpath($bakdir)}; if ($@) { warn "Unable to find or create directory $bakdir\n$@\n"; pod2usage(1); exit; } # process the ignore file list my @ignorelist; if ($ignorefile ne '') { open(IGN, $ignorefile) || die "Unable to open $ignorefile for readi +ng: $!\n"; while (<IGN>) { chomp; my @globlist = bsd_glob($_, GLOB_TILDE | GLOB_ERR); if (GLOB_ERROR) { warn "ignorefile entry '$_' produced and error!\n"; } foreach (@globlist) { push @ignorelist, $_; } } close(IGN); } # create a tar.gz archive with a unique filename my @t = reverse((localtime)[0..5]); $t[0]+=1900; $t[1]++; my $newbackup = $bakdir.'/'.sprintf("%4u-%02u-%02u-%02u-%02u-%02u",@t) +.'.tar.gz'; my $tar = Archive::Tar->new(); find({ wanted => \&add_to_archive, follow=>1, follow_skip=>2, no_chdir +=>1}, $backup); $tar->write($newbackup, 9); sub add_to_archive { my $ignore = 0; foreach my $ignoreentry (@ignorelist) { if ($File::Find::name eq $ignoreentry) { print "IGNORED == match!!! " if ($debug >= 1); $ignore++; } if ($File::Find::name =~ /^$ignoreentry\//) { print "IGNORED dir match!!! " if ($debug >= 1); $ignore++; $File::Find::prune = 1; } } print "Added " if (!$ignore && ($debug >= 1)); print "$File::Find::name\n" if ($debug >= 1); return if $ignore; $tar->add_files($File::Find::name); }

Replies are listed 'Best First'.
Re: backup script runs out of memory
by imp (Priest) on Aug 08, 2006 at 20:22 UTC
    It looks like Archive::Tar::Streamed should solve your memory issue. I haven't used it, so YMMV.

    As for the $SIG{__DIE__} handler I'm not sure perl will be able to call that handler if the error encountered is 'Out of Memory'. Likely it just 'splodes, but that's pure speculation.

Re: backup script runs out of memory
by rir (Vicar) on Aug 08, 2006 at 20:43 UTC
    Depending on your perl's build options, the variable $^M may give you enough memory to die sensibly. See perlvar.

    Be well,
    rir

Re: backup script runs out of memory
by graff (Chancellor) on Aug 09, 2006 at 05:24 UTC
    Wouldn't it be easier just to use "tar" (the GNU version is the accepted standard, and it is available for all common platforms).

    If a perl wrapper makes it more comfortable for you, here's a version of your script that is functionally equivalent(*) to the OP, but is a lot easier (and will run a lot faster, without using very much memory at all):

    #!/usr/bin/perl -w =head1 NAME backup.pl - Yet Another Script for making backups =head1 SYNOPSIS backup.pl --bakdir=s --backup=s [--ignorefile=s] Options: --bakdir - where to look for and store backup files --backup - what directory to backup --ignorefile - file which lists what subdirs not to backup =cut use strict; use warnings; use English; use Getopt::Long; use Pod::Usage; use File::Path; my $bakdir = ''; my $backup = ''; my $ignorefile = ''; GetOptions( 'bakdir=s' => \$bakdir, 'backup=s' => \$backup, 'ignorefile=s' => \$ignorefile, ); $bakdir ||= "."; $backup ||= "."; if ( $bakdir eq $backup ) { warn "We should not create a backup of $backup in $bakdir\n"; pod2usage(1); exit; } eval {mkpath($bakdir)}; if ($@) { warn "Unable to find or create directory $bakdir\n$@\n"; pod2usage(1); exit; } # create a tar.gz archive with a unique filename my @t = reverse((localtime)[0..5]); $t[0]+=1900; $t[1]++; my $t = sprintf("%4u-%02u-%02u-%02u-%02u-%02u",@t); my $newbackup = "$bakdir/$t.tar.gz"; my @cmd = qw/tar cz/; push @cmd, '-X', $ignorefile if ( length( $ignorefile ) and -f $ignorefile and -r _ ); push @cmd, '-f', $newbackup, $backup; exec @cmd;

    (* footnote: it's not exactly equivalent -- I felt compelled to add a couple checks on the ARGV option values; more could still be done in this regard...)

    For some reason, pod2usage did not work for me as expected, but with proper command-line args, this version does accomplish everything that the OP set out to do.

    (updated code to add one more check on the "ignorefile" arg, and to remove the "debug" option, since it's not really needed here)

      Thanks. I should have looked at the man page for tar to see if it could have used the ignorefile straight up.

      I do, however, have a question about this line
      if ( length( $ignorefile ) and -f $ignorefile and -r _ );
      I understand the first two checks to see if the argument is specified and if that file exists. But what is the third check? I thought that -r should test if the file is readable. But what then is the _?

        But what then is the _?

        Look up "perldoc -f -X"; the underscore means "get this information from the same stat data structure that we used the last time", which saves a few ops. "-r _" assures that the file we just checked with "-f" is readable by the current user.

Re: backup script runs out of memory
by Anonymous Monk on Aug 09, 2006 at 08:18 UTC
    no English;