murrayn has asked for the wisdom of the Perl Monks concerning the following question:

I have a script which uses Archive::Tar to make a transportable copy of a directory tree. A colleague installed it at a client site and ran out of memory building the archive object before writing it to disk. The Archive::Tar FAQ on CPAN offer some options but they won't fit the case.

My script must be OS agnostic and capable of transferring the directory across platforms in any direction (Windows to Linux, AIX to Windows. etc).

I've looked at Archive::Zip but it's not clear to me where the archive object is built - is it in memory or on disk somewhere BEFORE the $zip->writeToFileNamed method is called? I don't want to rewrite things only to run out of memory all over again!

Replies are listed 'Best First'.
Re: Memory limits on Tar modules
by kevbot (Vicar) on Mar 22, 2016 at 07:43 UTC
Re: Memory limits on Tar modules
by afoken (Chancellor) on Mar 22, 2016 at 20:07 UTC
    I've looked at Archive::Zip but it's not clear to me where the archive object is built - is it in memory or on disk somewhere BEFORE the $zip->writeToFileNamed method is called?

    http://search.cpan.org/~phred/Archive-Zip-1.56/lib/Archive/Zip/FAQ.pod#Can%27t_Read/modify/write_same_Zip_file states:

    Archive::Zip doesn't (and can't, generally) read file contents into memory, the original Zip file is required to stay around until the writing of the new file is completed.

    So my guess is that Archive::ZIP only keeps meta data in memory.

    In the same document, there is another thing that might be relevant for you:

    Q: Why doesn't Archive::Zip deal with file ownership, ACLs, etc.?

    A: There is no standard way to represent these in the Zip file format. If you want to send me code to properly handle the various extra fields that have been used to represent these through the years, I'll look at it.

    Alexander

    --
    Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)
Re: Memory limits on Tar modules
by murrayn (Sexton) on Mar 23, 2016 at 01:29 UTC

    Thanks for the suggestions.

    This is the comment, from the Archive::Zip CPAN document's Overview section, which is not clear. Specifically, does this mean that files on disk which have NOT YET been written to an archive are merely pointed to when I perform a $tar->addfile(xxxxx)?

    Archive members hold information about the individual members, but not (usually) the actual member data. When the zip is written to a (different) file, the member data is compressed or copied as needed. It is possible to make archive members whose data is held in a string in memory,...

    I hope my "big data" client will test this for me. I will report back.

      This is the comment, from the Archive::Zip CPAN document's Overview section, which is not clear. Specifically, does this mean that files on disk which have NOT YET been written to an archive are merely pointed to when I perform a $tar->addfile(xxxxx)?

      I think so. My guess is that $tar->addfile(...) just writes some headers and the filename to the $tar object (or some inner object), and the final $tar->writeToFileNamed('some.zip') actually reads and compresses all files.

      You can look at the source of Archive::Zip right on the CPAN website. Finding out what happens inside writeToFileNamed should be quite easy. Archive::Zip does not use XS, it's a plain perl module. A nasty litte detail is that Archive::Zip->new() does not return an Archive::Zip object, but an Archive::Zip::Archive object. So you want to search for writeToFileNamed() in Archive/Zip/Archive.pm. writeToFileNamed() wraps wrapToFileHandle(). From there, each member of the archive (which is a different object) writes itself to the file handle, using its _writeToFileHandle() method.

      My guess from here is that member objects are instances of Archive::Zip::Member or Archive::Zip::FileMember. (Would have to search for what addfile() does.) Archive::Zip::FileMember class inherits most code from Archive::Zip::Member. Following the code there leads to _writeData(), that seems to read from the original file using readChunk() and to write to the archive.

      Alexander

      --
      Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)