Takamoto has asked for the wisdom of the Perl Monks concerning the following question:

Hi monks. How can I unzip a docx file and read in memory the content of one member?

use Archive::Zip; my $InputFileReadable = 'text.docx'; my $zip = Archive::Zip->new(); $zip->read( $InputFileReadable ) == AZ_OK or die "Unable to open Offic +e file\n"; print my $wfh = $zip->extractMember( 'word/document.xml' );

$wfh does not contain anything. The following writes the xml to file correctly.

my $zip = Archive::Zip->new(); $zip->read( $InputFile ) == AZ_OK or die "Unable to open Office file\n +"; my $wfh = $zip->extractMember( 'word/document.xml', "word/document.xml +" );

Replies are listed 'Best First'.
Re: Archive::Zip into memory
by Corion (Patriarch) on Feb 12, 2023 at 17:16 UTC

    You can get to the content by using the ->fh method of the entry:

    sub _open_sxc_fh($self, $fh, $member, %options) { my $zip = Archive::Zip->new(); my $status = $zip->readFromFileHandle($fh); $status == AZ_OK or croak "Read error from zip"; my $content = $zip->memberNamed($member); if( ! defined $content ) { if( $options{ optional }) { return; } else { croak "Want to read $member' but it doesn't exist!"; } } $content->rewindData(); my $stream = $content->fh; 1 if eof($stream); # reset eof state of $stream?! Is that a bug? W +here? binmode $stream => ':gzip(none)'; return $stream }

    The above subroutine gives you a filehandle from which you can read the information. If you want the data directly instead of a filehandle, use the following:

    my $mem; my $fh = _open_sxc_fh($self, $zipfh, 'readme.txt'); local $/; my $content = <$fh>; # sluuurp
Re: Archive::Zip into memory
by marto (Cardinal) on Feb 12, 2023 at 17:15 UTC

    This isn't an answer to your question, if I get time this week I'll have a look. I've not used Archive::Zip for this, however I do something similar at work and can confirm this method works to extract the xml to a variable:

    use IO::Uncompress::Unzip qw(unzip $UnzipError); #lots of unrelated code... my $word = 'test.docx'; # In the actual script I was walking a NetApp +filer. my $settingsXML; unzip $word => \$settingsXML, Name => 'word/settings.xml' or die "unzip failed: $UnzipError\n";

    I used this to detect various problematic settings in a vast archive of documents. IO::Uncompress::Unzip is a core module btw.

Re: Archive::Zip into memory
by alexander_lunev (Pilgrim) on Feb 12, 2023 at 18:10 UTC

    Hello! Isn't the contents method you're seeking?

    my $zip = Archive::Zip->new(); unless ( $zip->read( "template.ods" ) == AZ_OK ) { die 'ZIP read error'; } my $contents = $zip->contents( "content.xml" );