drewhead has asked for the wisdom of the Perl Monks concerning the following question:

I'm trying to pull some data from a zip file on a remote HTTP server. To this end I thought I could simply UserAgent it over, print it to a File::Temp object and let Archive::Zip's readFromFileHandle suck the data into an object that I can pick out what I wanted. I seem to be having an issue getting methods from Archive::Zip to play nice with a File::Temp filehandle?

ActiveState 5.12.4

use File::Temp; use LWP; use Archive::Zip qw( :ERROR_CODES :CONSTANTS ); use Data::Dumper; my $url = 'http://keep.drewhead.org/test.zip'; my $file = '../test.zip'; # a local copy of $url ###################################################################### # This works { my $zip = Archive::Zip->new(); my $zip_err = $zip->read( $file ); unless ($zip_err == AZ_OK ) { die "Archive::Zip read error on filehandle: $zip_err\n"; } foreach my $member ($zip->members()) { my $fileName = $member->fileName(); print "fileName = $fileName\n"; my $content = $member->contents(); print "contents $fileName = ".Dumper($content); } } ###################################################################### # This doesn't { my $ua = LWP::UserAgent->new; $ua->timeout(10); $ua->env_proxy; my $response = $ua->get($url); my $tmp = File::Temp->new(); if ($response->is_success) { print $tmp $response->decoded_content; print "Wrote ".$tmp->filename()."\n"; } else { die "error ".$response->status_line; } unless (-B $tmp) { die "Did not get a filehandle\n"; } my $zip = Archive::Zip->new(); my $zip_err = $zip->readFromFileHandle( $tmp ); unless ($zip_err == AZ_OK ) { die "Archive::Zip read error on filehandle: $zip_err\n"; } foreach my $member ($zip->members()) { my $fileName = $member->fileName(); print "fileName = $fileName\n"; my $content = $member->contents(); print "contents $fileName = ".Dumper($content); } } ###################################################################### # This works but I'm replacing File::Temp functionality { my $ua = LWP::UserAgent->new; $ua->timeout(10); $ua->env_proxy; my $response = $ua->get($url); my $tempfile = 'C:\Documents and Settings\ddowling\Local Settings\Te +mp\drewtmp1'; open(my $fh, ">$tempfile") || die "Unable to open $tempfile:$!\n"; binmode($fh); print $fh $response->decoded_content; close($fh); my $zip = Archive::Zip->new(); my $zip_err = $zip->read( $tempfile ); unless ($zip_err == AZ_OK ) { die "Archive::Zip read error on filehandle: $zip_err\n"; } foreach my $member ($zip->members()) { my $fileName = $member->fileName(); print "fileName = $fileName\n"; my $content = $member->contents(); print "contents $fileName = ".Dumper($content); } }
The first block there does what I expect the second block to do: print out the file names and their contents. However the second block complains:
Wrote C:\DOCUME~1\ddowling\LOCALS~1\Temp\mhtbVVHEN5 fileName = test2.txt fileName = test1.txt IO error: Can't open C:\DOCUME~1\ddowling\LOCALS~1\Temp\mhtbVVHEN5 : I +nvalid argument at C:/Perl/lib/Archive/Zip/FileMember.pm line 40 Archive::Zip::FileMember::_openFile('Archive::Zip::ZipFileMember=H +ASH(0x1c40c14)') called at C:/Perl/lib/Archive/Zip/FileMember.pm line + 30 Archive::Zip::FileMember::fh('Archive::Zip::ZipFileMember=HASH(0x1 +c40c14)') called at C:/Perl/lib/Archive/Zip/ZipFileMember.pm line 384 Archive::Zip::ZipFileMember::rewindData('Archive::Zip::ZipFileMemb +er=HASH(0x1c40c14)') called at C:/Perl/lib/Archive/Zip/Member.pm line + 960 Archive::Zip::Member::contents('Archive::Zip::ZipFileMember=HASH(0 +x1c40c14)') called at C:/Documents and Settings/ddowling/Desktop/eoc_ +cat_reason/example/zipseek.pl line 52 IO error: Can't open C:\DOCUME~1\ddowling\LOCALS~1\Temp\mhtbVVHEN5 : I +nvalid argument at C:/Perl/lib/Archive/Zip/FileMember.pm line 40 Archive::Zip::FileMember::_openFile('Archive::Zip::ZipFileMember=H +ASH(0x1c41184)') called at C:/Perl/lib/Archive/Zip/FileMember.pm line + 30 Archive::Zip::FileMember::fh('Archive::Zip::ZipFileMember=HASH(0x1 +c41184)') called at C:/Perl/lib/Archive/Zip/ZipFileMember.pm line 384 Archive::Zip::ZipFileMember::rewindData('Archive::Zip::ZipFileMemb +er=HASH(0x1c41184)') called at C:/Perl/lib/Archive/Zip/Member.pm line + 960 Archive::Zip::Member::contents('Archive::Zip::ZipFileMember=HASH(0 +x1c41184)') called at C:/Documents and Settings/ddowling/Desktop/eoc_ +cat_reason/example/zipseek.pl line 52
If I say that it appears that the contents() method is stringifing the Temp:File object filehandle am I reading this correctly? How do I make it stop doing that?

I realize I could just do what I have done in the 3rd block and eliminate File::Temp, but I'm trying to understand what's going on here... also I already have a module that is doing the URL retrieval and tmpfile storage for other non zip processes that I kinda what to continue using. File::Temp is nice to use if you don't want to have to worry about manually unlinking things at the end.

Feel free to bang against $url, zip on my hosting with nonsense data.

Replies are listed 'Best First'.
Re: Reading data from ZIP data in a File::Temp filehandle problem
by Corion (Patriarch) on Sep 14, 2011 at 19:15 UTC

    In your second example, you never rewind $tmp to its start position. Maybe Archive::Zip wants that?

      Thanks for the response. I added a rewind thusly:
      ###################################################################### # This doesn't { my $ua = LWP::UserAgent->new; $ua->timeout(10); $ua->env_proxy; my $response = $ua->get($url); my $tmp = File::Temp->new(); if ($response->is_success) { binmode($tmp); print $tmp $response->decoded_content; print "Wrote ".$tmp->filename()."\n"; $tmp->seek(0,0); # Added rewind } else { die "error ".$response->status_line; } unless (-B $tmp) { die "Did not get a filehandle\n"; } my $zip = Archive::Zip->new(); my $zip_err = $zip->readFromFileHandle( $tmp ); unless ($zip_err == AZ_OK ) { die "Archive::Zip read error on filehandle: $zip_err\n"; } foreach my $member ($zip->members()) { my $fileName = $member->fileName(); print "fileName = $fileName\n"; my $content = $member->contents(); print "contents $fileName = ".Dumper($content); } }
      which didn't appear to affect the issue.

        It seems that ->readFromFileHandle really wants the filename. The following works for me:

        use strict; use LWP::UserAgent; use Archive::Zip; use Archive::Zip qw( :ERROR_CODES :CONSTANTS ); use Data::Dumper; my $url = 'http://keep.drewhead.org/test.zip'; my $file = '../test.zip'; # a local copy of $url # This doesn't { my $ua = LWP::UserAgent->new; $ua->timeout(10); $ua->env_proxy; my $response = $ua->get($url); warn length($response->decoded_content); my $tmp; if ($response->is_success) { $tmp = File::Temp->new(); binmode $tmp; print {$tmp} $response->decoded_content; print "Wrote ".$tmp->filename()."\n"; } else { die "error ".$response->status_line; } #unless (-B $tmp) { die "Did not get a filehandle\n"; } my $zip = Archive::Zip->new(); my $zip_err = $zip->readFromFileHandle( $tmp, $tmp->filename ); unless ($zip_err == AZ_OK ) { die "Archive::Zip read error on filehandle: $zip_err\n"; } foreach my $member ($zip->members()) { my $fileName = $member->fileName(); print "fileName = $fileName\n"; my $content = $member->contents(); print "contents $fileName = ".Dumper($content); } }

        I added the binmode call before writing the tempfile, and I pass the name of the tempfile down to ->readFromFileHandle().

Re: Reading data from ZIP data in a File::Temp filehandle problem
by Lotus1 (Vicar) on Sep 14, 2011 at 21:28 UTC

    I added my $tmp = IO::File->new( 'test1.zip', 'r' ); straight from the Archive::Zip documentation for readFromFileHandle() at cpan to open a filehandle and it works. I tried first with open() and got an error from Archive::Zip that the file must be seekable. Then in the documentation it mentions it must be seekable. I didn't realize there was a difference.

    use strict; use warnings; use Archive::Zip qw( :ERROR_CODES :CONSTANTS ); use Data::Dumper; my $tmp = IO::File->new( 'test1.zip', 'r' ); my $zip = Archive::Zip->new(); my $zip_err = $zip->readFromFileHandle( $tmp ); unless ($zip_err == AZ_OK ) { die "Archive::Zip read error on filehandle: $zip_err\n"; } foreach my $member ($zip->members()) { my $fileName = $member->fileName(); print "fileName = $fileName\n"; my $content = $member->contents(); print "contents $fileName = ".Dumper($content); }
      Yes, but I'm not starting with a file, I'm starting with an URL and staging the file temporarily via File::Temp. The pod documentation of File::Temp specifically states:
      Filehandles returned by these functions support the seekable methods.
      I don't understand why I would necessarily need to know what the file being created on the fly specifically is if I can just pass a filehandle around and make Archive::Zip read data from the filehandle. Clearly I now see that this differs from how Archive::Zip works. And I do see the documentation that states that is does not yet support streams. I was hoping not to have code that needed to be aware of a temporary file given that such a file was already an open filehandle. It looks like readFromFileHandle() is really only a way to save one an extra file open operation given a file was previously opened?

        The way to troubleshoot this is to verify all the parts work individually and then put them together. The next thing to verify is that if the temporary file handle contains a valid zip file that readFromFileHandle will read it. I have experience with Archive::Zip not temporary files. If someone could show how to copy a zip file from disk to a temporary file then you could use that for testing.