paulehr has asked for the wisdom of the Perl Monks concerning the following question:

Hello fellow monks, long time listener first time caller. I am having a problem with Archive::Tar on a windows XP machine. I am using Activestate perl 5.8.8 and Archive::Tar 1.23. What i am trying to accomplish is extract one file from a each tarball that is in a directory. This is what I have so far
#!/usr/bin/perl -w use strict; use Archive::Tar; my $wdir = "U:\\EMM loads"; my $tarballs; my $toc = "toc"; opendir(PWD, $wdir) or die "can't open $wdir: $!"; while (defined($tarballs = readdir(PWD))) { if ($tarballs =~ /\.tar$/) { my $tar = Archive::Tar->new("$wdir\\$tarballs"); $tar->extract_file($toc,$wdir); } } closedir(PWD);
The Problem i am having is that when it tries to read the tarball it bombs with a checksum error. I have tried this with several tarballs and they all do the same thing. I can open these fine using winzip and on a solaris machine fine. Has anyone run into this before? or am I missing something? Thanks for the help!

Replies are listed 'Best First'.
Re: help with Archive::tar
by jasonk (Parson) on Mar 29, 2006 at 03:50 UTC

    It's hard to tell without more information, but I would guess that the tarballs contain a directory structure. If unzipping them with one of those other clients gives you a directory which contains the 'toc' file, then you need to specify that path when extracting. Try using $tar->list_files to see if there is a directory structure to consider.


    We're not surrounded, we're in a target-rich environment!
Re: help with Archive::tar
by graff (Chancellor) on Mar 29, 2006 at 06:50 UTC
    I expect that it does not matter, but I noticed that the man page for Archive::Tar does say that to use "the full (unix) path (including file name)" as the second arg to the "extract_file" method. I suppose it might be surprising if this really made a difference, but you should be aware that in perl generally, you can use the unix-style forward-slash instead of dos-style back-slashes in path specs.

    Anyway, I think the first reply is more likely to have nailed it. When you do "tar tf whatever.tar" on the solaris box, do you see one or more directory names and slashes in front of "toc"?

    I don't know why this would trigger a "checksum error"; I assume that winzip succeeded on the exact same file where your perl script failed (and that "success" means the "toc" file looked fine after being extracted), so that would rule out a file corruption problem.

    Looking at the module's source code, I do see the obligatory "binmode" applied to the input file handle, so it's not a dos text-mode-io problem either; also, I see that the checksum error involves trying to validate a particular file, but the commentary that surrounds this part of the code seems odd. If I were to run into this problem (and if I felt I had the time), I'd step through with "perl -d", to see where it's getting the reference checksum, and why that isn't matching the data.

      Thanks for the replies so far.

      Still kind of new to Perl and always looking for insight when i hit a brick wall :)

      I expect that it does not matter, but I noticed that the man page for Archive::Tar does say that to use "the full (unix) path (including file name)" as the second arg to the "extract_file" method. I suppose it might be surprising if this really made a difference, but you should be aware that in perl generally, you can use the unix-style forward-slash instead of dos-style back-slashes in path specs.

      I could give this a shot and see what happens. Also I didnt know you can use unix-style forward slashes when you are working on a windows machine like that.

      So for example instead of using something like U:\\EMM Loads (escapeing the backslash) i can do \U\EMM Loads ?

      Anyway, I think the first reply is more likely to have nailed it. When you do "tar tf whatever.tar" on the solaris box, do you see one or more directory names and slashes in front of "toc"?

      When i untar one of these on a solaris box the tarball would be named something like 81334567-JX4578.tar and when i untar it on the solaris box it would look like 81334567-COMPLETE/toc, 81334567-COMPLETE/<next file>, etc. Same if i just extract the whole tarball in winzip.

        the tarball would be named something like 81334567-JX4578.tar and when i untar it on the solaris box it would look like 81334567-COMPLETE/toc, 81334567-COMPLETE/<next file>, etc.

        That means this particular tar ball contains a directory called "81334567-COMPLETE", and the file you want to extract is inside that directory. You need to pass "81334567-COMPLETE/toc" as first arg in the "extract_file" call.

        To do that in a programmatic way -- without having to hard-code the directory name for each tar file -- you could try something like this:

        #!/usr/bin/perl use strict; use Archive::Tar; my $wdir = "U:/EMM loads"; # load an array of tar file names: open( DIR, $wdir ) or die "$wdir: $!"; my @tarballs = grep /\.tar$/, readdir DIR; closedir DIR; # for each tar file, yank out the "toc" file: for my $tarball ( @tarballs ) { my $tar = Archive::Tar->new( "$wdir/$tarball" ); my ($toc) = grep { $_->name =~ m{/toc$} } $tar->get_files; # make up a local name for this toc file, based on the tarball name: ( my $tocname = $tarball ) =~ s/tar$/toc/; # write the toc file data to the local toc file: open( TOC, ">", $tocname ) or die "$tocname: $!"; print TOC $toc->get_content; close TOC; }
        Actually, the dir structure for unix is the other slash. / not \.

        Here is the example that I could find from my tar program for extracting:

        sub untar{ my $tar=Archive::Tar->new; @_ = glob ('*'); foreach my $otar(@_){ next unless($otar =~ /(\.tar)$/i); $tar->read($otar,1); my @contents = $tar->get_files(); foreach(@contents){ $tar->extract($_->{name}); print "$tar-error\n"; sleep 60; if($tar->error !~ //){ print "$_->{name} extracted\n"; }else{ print "Error extracting $_->{name} from $otar\n"; } } undef $tar; } }