phantom20x has asked for the wisdom of the Perl Monks concerning the following question:

I have quite a large log file that I'm parsing and the script I have is very rough. With that aside, memory management is not something I have worked with before, and when this script is run I do run out of memory. What would be the most efficient way to parse through a 400mb file? Please excuse the dirty code:
while(<>){ if ($_=~"Sent:"){ $count += 1; push (@dates, $_); } if ($count < 1){ if ($_=~"Per Domain Breakdown"){ until ($_=~"From:"){push (@list, $_);} push (@list, "BREAK $count"); } } } print @dates; print @list[1];
The count was added only to test the first 1000 pieces or so of data. As to that, the data was not even being pushed into @list. Any help would be great. Thanks! ~Phantom **UPDATE** What I was attempting now works with this code:
while(<>){ if (/^[0-9]+, |Sent:/){ push (@logs, $_); } } print $logs[0]; print $logs[1];
As Funkymonk and TGI pointed out I misunderstood my program, and the until hurt it. Thanks for all the help! ~Phantom

Replies are listed 'Best First'.
Re: Memory Problems with Parsing
by FunkyMonk (Bishop) on Jul 30, 2007 at 08:13 UTC

    Usually, loops need something that allows them to finish, but your until loop doesn't have anything to make it end:

    until ($_=~"From:"){push (@list, $_);}
    If the test succeeds, it always will, because there's nothing in there that will change $_.

    As an aside, it's more usual to write regular expressions that look like regular expressions, not strings. So, $_=~"From:" can be written $_ =~ /From:/, or just /From:/ as they are automagically bound to $_.

    update: clarification of first para

      The way the document looks, it starts with "From..." or at least that's how each day in the log starts. As it worked well with grabbing the date, with $_=~"Sent:", I figured it would work fine to tell if at this point (per domain breakdown) grab everything and put it into a list, until reaching from, in which case it would need to start all over. Maybe I'm misunderstanding the until statement though. Thanks for the regular expression tip, will put to good use!

        I think the problem is that you are misunderstanding how your program moves through your file. $_ only changes when the outer while loop completes.

        Consider this code:

        while ( <DATA> ) { chomp; until( /[^\d]/ ) { print "$_"; } } __DATA__ 1 A B C

        This code will never terminate, it will just print '1' forever. The outer while block never completes and it never sees past the first line returned by the special DATA filehandle.

        Try setting a flag to indicate what you should be looking for in your while loop. Use either an if-else chain or a dispatch table to handle each case.


        TGI says moo

Re: Memory Problems with Parsing
by moritz (Cardinal) on Jul 30, 2007 at 09:23 UTC
    Since you are not using all elements of @list, you shouldn't store them all.

    Furthermore you are only printing the values in @date, which means you don't actually have to store them, but you can print them right away instead of pushing them.