in reply to Memory limits on Tar modules

Thanks for the suggestions.

This is the comment, from the Archive::Zip CPAN document's Overview section, which is not clear. Specifically, does this mean that files on disk which have NOT YET been written to an archive are merely pointed to when I perform a $tar->addfile(xxxxx)?

Archive members hold information about the individual members, but not (usually) the actual member data. When the zip is written to a (different) file, the member data is compressed or copied as needed. It is possible to make archive members whose data is held in a string in memory,...

I hope my "big data" client will test this for me. I will report back.

Replies are listed 'Best First'.
Re^2: Memory limits on Tar modules
by afoken (Chancellor) on Mar 23, 2016 at 21:42 UTC
    This is the comment, from the Archive::Zip CPAN document's Overview section, which is not clear. Specifically, does this mean that files on disk which have NOT YET been written to an archive are merely pointed to when I perform a $tar->addfile(xxxxx)?

    I think so. My guess is that $tar->addfile(...) just writes some headers and the filename to the $tar object (or some inner object), and the final $tar->writeToFileNamed('some.zip') actually reads and compresses all files.

    You can look at the source of Archive::Zip right on the CPAN website. Finding out what happens inside writeToFileNamed should be quite easy. Archive::Zip does not use XS, it's a plain perl module. A nasty litte detail is that Archive::Zip->new() does not return an Archive::Zip object, but an Archive::Zip::Archive object. So you want to search for writeToFileNamed() in Archive/Zip/Archive.pm. writeToFileNamed() wraps wrapToFileHandle(). From there, each member of the archive (which is a different object) writes itself to the file handle, using its _writeToFileHandle() method.

    My guess from here is that member objects are instances of Archive::Zip::Member or Archive::Zip::FileMember. (Would have to search for what addfile() does.) Archive::Zip::FileMember class inherits most code from Archive::Zip::Member. Following the code there leads to _writeData(), that seems to read from the original file using readChunk() and to write to the archive.

    Alexander

    --
    Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)