spikey_wan has asked for the wisdom of the Perl Monks concerning the following question:

Hello World!

I have searched through the site, and found out how to read a zip file, but what I want to do is slightly different and possibly unusual?

I have a parser that parses large log files. Users seem to like keeping the log files zipped (win32), to save on disk space, so I have been asked if I can make my parser parse the log file without unzipping it first.

Does anyone know if this is possible?

i.e. Take a text file. Zip it up. THEN read the text file, and parse it line by line without unzipping it.

Thanks.

Spike.

Replies are listed 'Best First'.
Re: Reading a zipped file (win32)
by pbeckingham (Parson) on Jun 29, 2004 at 12:57 UTC

    Take a look at Compress::Zlib and Archive::ZIP. While they do not quite serve your purpose, you could wrap them with your own code to provide for your needs.

Re: Reading a zipped file (win32)
by gellyfish (Monsignor) on Jun 29, 2004 at 13:05 UTC

    Well somewhere along the line something is going to have to unzip the file but if you want to have a transparent read of the zipped file without any apparent imtermediate uncompression then you might create a PerlIO 'layer' using Archive::Zip that would allow you to do something like:

    open FOO , '<:zip','zipfile.zip' || die "$!\n"; while(<FOO>) { # ... }

    The module to do this doesn't exist yet but if you read the Perl IO documentation and An example implementation then it should be quite clear what you need to do.

    /J\

      But if I could read the zip file directly, line by line, then the amount of memory or disk space required would be minimal, wouldn't it? Far less than uncompressing the file, surely? The log files are now so large that I had to give up slurping them and process them line by line, quite a while ago.

      Unfortunately, I am struggling with the answers I have so far, can anyone make it a bit clearer to my thick head?.

      Thanks,

      Spike.

        A quick glance at the docs for Archive::Zip reveals the existance of a low level routine named readChunks, which I expect will get you further in your quest.

        You will however have to manage chunk boundaries and line endings logic yourself.

        Update: The chunkSize parameter refers to the source data, ie compressed, so expecting compression ratios in the order of 95 % you will need to set this parameter sufficiently small as the inflated data can expand to ~ 20 times your requested chunksize.

        HTH

        It is quite simple really, you cannot read a compressed file by the lines of its content without uncompressing it first - the compression will not preserve the lines of the original data. BY the way you describe your problem it looks like it would be best to scan the file line by line before compressing it - as has already been suggested.

        /J\

        Compressed data isn't really usable -- that's the trade off you make for space.

        If you can't temporarily decompress data, process it and then put it back, you have problems bigger than perl technique. You need more storage or a less wasteful record in the first place.

        If you post more details or start a different thread, maybe we can get you fixed up in a whole other way. One that will get the answer you need at the end of the day w/o skinning this particular cat.

Re: Reading a zipped file (win32)
by borisz (Canon) on Jun 29, 2004 at 12:57 UTC
Re: Reading a zipped file (win32)
by gri6507 (Deacon) on Jun 29, 2004 at 12:55 UTC
    What's the problem with unzipping it? You could always zip it up after you're done, leaving the environment clean.
      If the log file is very large, and there's not much disk space left, I may run out of space when unzipping it.