in reply to Re: Reading a zipped file (win32)
in thread Reading a zipped file (win32)

But if I could read the zip file directly, line by line, then the amount of memory or disk space required would be minimal, wouldn't it? Far less than uncompressing the file, surely? The log files are now so large that I had to give up slurping them and process them line by line, quite a while ago.

Unfortunately, I am struggling with the answers I have so far, can anyone make it a bit clearer to my thick head?.

Thanks,

Spike.

Replies are listed 'Best First'.
Re^2: Reading a zipped file (win32)
by guha (Priest) on Jun 29, 2004 at 14:08 UTC

    A quick glance at the docs for Archive::Zip reveals the existance of a low level routine named readChunks, which I expect will get you further in your quest.

    You will however have to manage chunk boundaries and line endings logic yourself.

    Update: The chunkSize parameter refers to the source data, ie compressed, so expecting compression ratios in the order of 95 % you will need to set this parameter sufficiently small as the inflated data can expand to ~ 20 times your requested chunksize.

    HTH
      Look at Archive::Zip::MemberRead. It allows reading members of zip files like a filehandle. The example from the docs reads through a contained file by lines.
      use Archive::Zip; use Archive::Zip::MemberRead; $zip = new Archive::Zip("file.zip"); $fh = new Archive::Zip::MemberRead($zip, "subdir/abc.txt"); while (defined($line = $fh->getline())) { print $fh->input_line_number . "#: $line\n"; }
Re^2: Reading a zipped file (win32)
by gellyfish (Monsignor) on Jun 29, 2004 at 13:57 UTC

    It is quite simple really, you cannot read a compressed file by the lines of its content without uncompressing it first - the compression will not preserve the lines of the original data. BY the way you describe your problem it looks like it would be best to scan the file line by line before compressing it - as has already been suggested.

    /J\

      OK, what the guys are doing, is taking very large log files from a unix machine. They FTP them onto their PCs, and compress them using zip. They then give the log files to our engineers who have to make sense of them. This is when they run my parser on the file. At the moment, they have to uncompress the file, then run my parser on it. Because the files are so large, they want to be able to run my parser on the zipped log file without unzipping it first.

        Suggestion: Until someone writes the :via(Zip) IO layer, have the engineers unzip the logfiles into a directory that has the window "Compress contents" bit set.

        If they do this when transfering the files onto their local drives, whether from the network or replacable media, the amount of local diskspace used will probably be roughly the same as if the saved the .zip file.

        The advantage is that the filesystem will do the decompression on-the-fly and you don't need to modify your parser at all.


        Examine what is said, not who speaks.
        "Efficiency is intelligent laziness." -David Dunham
        "Think for yourself!" - Abigail
        "Memory, processor, disk in that order on the hardware side. Algorithm, algoritm, algorithm on the code side." - tachyon
Re^2: Reading a zipped file (win32)
by pboin (Deacon) on Jun 29, 2004 at 13:53 UTC

    Compressed data isn't really usable -- that's the trade off you make for space.

    If you can't temporarily decompress data, process it and then put it back, you have problems bigger than perl technique. You need more storage or a less wasteful record in the first place.

    If you post more details or start a different thread, maybe we can get you fixed up in a whole other way. One that will get the answer you need at the end of the day w/o skinning this particular cat.