in reply to Reading a zipped file (win32)

Well somewhere along the line something is going to have to unzip the file but if you want to have a transparent read of the zipped file without any apparent imtermediate uncompression then you might create a PerlIO 'layer' using Archive::Zip that would allow you to do something like:

open FOO , '<:zip','zipfile.zip' || die "$!\n"; while(<FOO>) { # ... }

The module to do this doesn't exist yet but if you read the Perl IO documentation and An example implementation then it should be quite clear what you need to do.

/J\

Replies are listed 'Best First'.
Re: Reading a zipped file (win32)
by spikey_wan (Scribe) on Jun 29, 2004 at 13:17 UTC
    But if I could read the zip file directly, line by line, then the amount of memory or disk space required would be minimal, wouldn't it? Far less than uncompressing the file, surely? The log files are now so large that I had to give up slurping them and process them line by line, quite a while ago.

    Unfortunately, I am struggling with the answers I have so far, can anyone make it a bit clearer to my thick head?.

    Thanks,

    Spike.

      A quick glance at the docs for Archive::Zip reveals the existance of a low level routine named readChunks, which I expect will get you further in your quest.

      You will however have to manage chunk boundaries and line endings logic yourself.

      Update: The chunkSize parameter refers to the source data, ie compressed, so expecting compression ratios in the order of 95 % you will need to set this parameter sufficiently small as the inflated data can expand to ~ 20 times your requested chunksize.

      HTH
        Look at Archive::Zip::MemberRead. It allows reading members of zip files like a filehandle. The example from the docs reads through a contained file by lines.
        use Archive::Zip; use Archive::Zip::MemberRead; $zip = new Archive::Zip("file.zip"); $fh = new Archive::Zip::MemberRead($zip, "subdir/abc.txt"); while (defined($line = $fh->getline())) { print $fh->input_line_number . "#: $line\n"; }

      It is quite simple really, you cannot read a compressed file by the lines of its content without uncompressing it first - the compression will not preserve the lines of the original data. BY the way you describe your problem it looks like it would be best to scan the file line by line before compressing it - as has already been suggested.

      /J\

        OK, what the guys are doing, is taking very large log files from a unix machine. They FTP them onto their PCs, and compress them using zip. They then give the log files to our engineers who have to make sense of them. This is when they run my parser on the file. At the moment, they have to uncompress the file, then run my parser on it. Because the files are so large, they want to be able to run my parser on the zipped log file without unzipping it first.

      Compressed data isn't really usable -- that's the trade off you make for space.

      If you can't temporarily decompress data, process it and then put it back, you have problems bigger than perl technique. You need more storage or a less wasteful record in the first place.

      If you post more details or start a different thread, maybe we can get you fixed up in a whole other way. One that will get the answer you need at the end of the day w/o skinning this particular cat.