guzziee has asked for the wisdom of the Perl Monks concerning the following question:

Hi there, I'm currently parsing some xml files that are zipped inside a main zip file. The xml parser can process zip files with xml files in it, so currently I unzip the main zip file to disk, then unzip each individual zip file to that same directoy to get the xml files, then just single zip the xml files to one main zip file and finally process in parser. is it possible to unzip the double zipped files into a stream or memory and skip the unzipping to disk part? I'm able to stream single zipped files but having trouble with the double zipped ones. Thanks!

Replies are listed 'Best First'.
Re: Double zipped files
by talexb (Chancellor) on Apr 22, 2016 at 15:36 UTC

    I've used Archive::Zip before with good success.

    Alex / talexb / Toronto

    Thanks PJ. We owe you so much. Groklaw -- RIP -- 2003 to 2013.

      I agree with talexb. I recently used Archive::Zip to pull files out out of a ZIP file into memory. I don't see why you can't unzip a (zip) file into memory, then hand that memory stored file back to Archive::Zip to further unzip.

      Cheers,

      Brent

      -- Yeah, I'm a Delt.
Re: Double zipped files
by choroba (Cardinal) on Apr 22, 2016 at 15:38 UTC
    I fear the following creates the output files, but maybe it can still help you (at least, you don't have to create the files yourself): Unpack::Custom::Recursive on GitHub.

    ($q=q:Sq=~/;[c](.)(.)/;chr(-||-|5+lengthSq)`"S|oS2"`map{chr |+ord }map{substrSq`S_+|`|}3E|-|`7**2-3:)=~y+S|`+$1,++print+eval$q,q,a,
Re: Double zipped files
by graff (Chancellor) on Apr 22, 2016 at 22:19 UTC
    In addition to Archive::Zip, there's also IO::Uncompress::Unzip, which is a core module in the standard Perl distro. You can open a zip file, read each data file it contains into memory (i.e. store the file content in a scalar variable), and if any data file happens to be named with a ".zip" extension, there's a way to treat the scalar variable (holding the embedded zip file content) as a file handle, so you can use the module to unzip that as well.
Re: Double zipped files
by pmqs (Friar) on Apr 23, 2016 at 15:52 UTC

    Here is some code I wrote a while back that will do a recursive walk through a zip file that contains another zip and so on, to any depth. This code just prints the content, but the $unzip object at the line that says "Deal with the payload here" is just a filehandle, so if the xml parser accepts a filehandle it can be passed to it.

    This code also has the advantage that it doesn't involve storing either the complete compressed or uncompressed data in memory. It streams all data as needed.

    #!/usr/bin/perl use warnings; use strict; use IO::Uncompress::Unzip; sub walk { my $name = shift; my $fh = shift; my $indent = shift // 0 ; $indent += 2; my $unzip = new IO::Uncompress::Unzip $fh, or die "Cannot open zip\n" ; my $status; for ($status = 1; $status > 0; $status = $unzip->nextStream()) { my $name = $unzip->getHeaderInfo()->{Name}; warn " " x $indent . "Processing member $name\n" ; if ($name =~ /.zip$/) { walk($name, $unzip, $indent); } else { # Deal with the payload here my $buff; while (($status = $unzip->read($buff)) > 0) { # Do something here print "$buff\n"; } } last if $status < 0; } die "Error processing $name: $!\n" if $status < 0 ; } my $file = shift; my $fh ; open $fh, "<$file" ; warn "Processing zip file $file\n"; walk($file, $fh);