in reply to File list from gz file without reading everything into memory

You're probably not going to have much luck because tar.gz archives don't have a separate table-of-contents, but rather are just streamed one after the other with a header prepended (just like it was being written to, erm, tape; go figure . . .). On top of that, the entire archive is compressed, so in order to get to a file in the middle you've got to read along and uncompress until you find the file in the archive you're interested in.

The easiest thing (if you can wrangle it and want to avoid unnecessary overhead) would be to get whomever's providing you the file to switch to another format (e.g. zip) which has a separate index and individual parts can be easily extracted piecemeal.

The cake is a lie.
The cake is a lie.
The cake is a lie.

  • Comment on Re: File list from gz file without reading everything into memory

Replies are listed 'Best First'.
Re^2: File list from gz file without reading everything into memory
by drblove27 (Sexton) on Nov 18, 2009 at 20:29 UTC
    Dang... Will go back to the source and see if there is something better to do than that...

    If .tar.gz file is just a compression of a single file, is there a memory efficient way to extract it, almost like streaming it into the output file? Or in essence that is what the code is doing already?

    Thanks again for your reply, I will check back with the generator of these files to see if I can try another compression approach...