They said it couldn't be done... see Industrial strength archiving

But I have managed to wrap Archive::Tar to produce a version that operates on a tar file on-disk rather than in-memory.

You supply it with a file handle that you have already opened. If you use IO::Zlib tieing, this will do compression-on-the-fly as well.

It's on CPAN as Archive::Tar::Streamed. Comments... feedback... bugs...

--
I'm Not Just Another Perl Hacker

Replies are listed 'Best First'.
Re: RFC: Archive::Tar::Streamed
by Aristotle (Chancellor) on Jan 03, 2005 at 00:17 UTC

    Nice tricks. :-) It's not truly streaming in that it still keeps each archive member file in memory though. Nevertheless, in the common case that's going to be much better than keeping the entire archive in memory.

    It might be more efficient to keep the same Archive::Tar object around, calling clear() on it when you're done with a stream chunk, instead of instantiating a new one at every step.

    Makeshifts last the longest.

      Thanks for the comments.
      It's not truly streaming in that it still keeps each archive member file in memory though.
      That is not true for the output side. When writing a tar archive, I am passing a file name (or a list of file names) to A::T, but I suppose A::T could be slurping each file.

      On the input side, by presenting an Archive::Tar::File object (which contains the data for a file or directory, slurped) I am giving full flexibility to the application to decide what to do with it. It may be possible to provide a header only, and method calls to fetch the file straight to disk (I'll think about it, but my app doesn't need this right now).

      It might be more efficient to keep the same Archive::Tar object around, calling clear() on it when you're done with a stream chunk, instead of instantiating a new one at every step.
      I will certainly bear this in mind for the next version :).

      --
      I'm Not Just Another Perl Hacker

        but I suppose A::T could be slurping each file.

        I think it does actually. Have a look yourself though, my review was somewhat superficial.

        ---
        demerphq

Re: RFC: Archive::Tar::Streamed
by ctilmes (Vicar) on Jan 04, 2005 at 11:48 UTC
    I had a similar need, but my files don't even exist on disk. I was building a web server that returns tars of files that came from other web servers. I never "cleanly" integrated my approach with Archive::Tar or packaged it up nicely, but here is the gist of it:

    I take a callback from the caller, and call it to do any output. I pass the callback into HTTP::Request to retrieve the file from the remote site and pass it through to the web client.

    for each file { HEAD the file on the remote server to get the file size use Archive::Tar::File's internal _format_tar_entry to write the t +ar header use HTTP::Request to retrieve the file from the remote server, pas +sing it the callback for output write TAR_PAD to fill out the BLOCK } write TAR_END x 2
    Any ideas on a clean way to build a generic module that I could use are welcome...