Re: Fetch URL Contents to File Handle

I need to download a very large zipfile containing thousands of records, and print the first ten records ... Is there a way of downloading into a filehandle or pipe so that I don't have to download the entire large file?

A ZIP file's central directory is at the end of the file. Although you could get fancy with range requests, it might be easier to actually download the whole file. How big is "very large"? Update: Even though 10MB isn't that big for a daily download, it turns out to be much easier to use the site's API.

Comment on Re: Fetch URL Contents to File Handle

Replies are listed 'Best First'.
Re^2: Fetch URL Contents to File Handle by pmqs (Friar) on Jun 10, 2020 at 12:31 UTC
A ZIP file's central directory is at the end of the file. Although you could get fancy with range requests, ... This is true, but it is also possible to read a zip file in streaming mode without using the central directory at the end of the file. That's what IO::Uncompress::AnyUncompress does (via IO::Uncompress::Unzip). If there is a HTTP module that exposes a filehandle interface, then IO::Uncompress::AnyUncompress can read it.	[reply]
Re^3: Fetch URL Contents to File Handle by haukex (Archbishop) on Jun 10, 2020 at 20:12 UTC
This is true, but it is also possible to read a zip file in streaming mode without using the central directory at the end of the file. Yes, that's a good point, thanks! My understanding is that it's possible for files to have been deleted or replaced in the central directory but still be present in the ZIP file, but I haven't encountered such a ZIP file in the wild myself. I did write the parent node before I had looked into the ZIP file in question to discover that it only contains a single file.	[reply]