in reply to File list from gz file without reading everything into memory

There is Archive::Tar::Streamed which promises to be non-memory resident. Combined with Tie::Gzip or PerlIO::gzip in a second perl script and both connected by pipe it might even work on windows.
  • Comment on Re: File list from gz file without reading everything into memory

Replies are listed 'Best First'.
Re^2: File list from gz file without reading everything into memory
by XooR (Beadle) on Nov 19, 2009 at 09:27 UTC

    From Archive::Tar on Archive::Tar->iter Class method.

    Returns an iterator function that reads the tar file without loading it all in memory. Each time the function is called it will return the next file in the tarball. The files are returned as Archive::Tar::File objects. The iterator function returns the empty list once it has exhausted the the files contained.

    From FAQ:

    Isn't Archive::Tar heavier on memory than /bin/tar?

    ... If you just want to extract, use the extract_archive class method instead. It will optimize and write to disk immediately. ...

    Maybe this answers your question.

      So here is the code that I tried:
      use strict; use Archive::Tar; my $filename = "filename.tar.gz"; Archive::Tar->extract_archive($filename);
      As near as I can tell it just loaded (tried) into memory so it could then write out, but it ran out of memory before then. I think what this comment means to say is that if your tar file includes multiple files, it will not read them all into memory. Instead it will load a single file into memory, then write it out, then read the next file into memory, then write it out, etc.

      Where I am looking to try and do something like process binary chunk write it to file, remove binary chunk from memory, repeat until done. As far as I can tell this pm, does not do that... If anyone knows better, I would really appreciate it.

      Otherwise I am checking out really thin un-tar programs, like the ones mentioned in the comments to this post that I can package with my program, and just make a system call to them to handle the untar...

      If anyone has any other idea, I would love to hear them. I am stuck with this file compression since it is the output of the machine.