Hard to come up with a concise title, but here's the problem I am trying to solve. Given a directory of zipped log files, I need to perform the following for each file. Each zip file contains a single structured server log file with \r\n trailing each log entry.

*Decompress the file to something in memory, currently using a scalar

*Treat the scalar as if reading a file for input; read line-by-line in order to do something with each line

I am using IO::Uncompress::Unzip to decompress the zip files. Here's a snippet of my code.

use IO::Uncompress::Unzip qw(unzip $UnzipError); my $logDirectory = '/logs'; foreach my $zipFile (glob("$logDirectory/*.zip")) { my $output; print "Decompressing $zipFile to memory\n"; unzip $zipFile => \$output; #print $output; open my $fh, '<', \$output or die $!; while (<$fh>) { #do something } close $fh or die $!; }

The code within <$fh> read loop is never reached. If I print the content of $output after unzipping the file, the correct log data is displayed. I have alternatively tried various methods of splitting $output based on '\r\n' but I believe I'm reaching Perl size limitations; the unzipped data is roughly 2.1GB and Perl bombs on either 'split loop' error or some type of panic.

I am using perl 5.14 on a 64-bit linux machine with 64GB memory. The problem may seem odd, however I am trying to optimize the processing of thousands of compressed server log files. My 'old' perl script writes the decompressed log file to disk, reads that file for processing, and moves on to the next zip. Optimally I want to keep the decompressed content in memory and process the data, only writing to disk the log entries that match my search criteria.


In reply to Unzip file to scalar, treat scalar as a file by bwilli27

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.