Beefy Boxes and Bandwidth Generously Provided by pair Networks
Do you know where your variables are?
 
PerlMonks  

Re^2: trouble parsing log file...

by coreolyn (Parson)
on Nov 20, 2006 at 19:56 UTC ( [id://585122]=note: print w/replies, xml ) Need Help??


in reply to Re: trouble parsing log file...
in thread trouble parsing log file...

This node falls below the community's threshold of quality. You may see it by logging in.

Replies are listed 'Best First'.
Re^3: trouble parsing log file...
by inman (Curate) on Nov 20, 2006 at 20:02 UTC
    Dumping a multi Gigabyte log file into an array is going to get ugly quickly. A combination of the Perl internals and the IO buffering on the computer should take care of this situation line by line.
      Here's what I have:
      use strict; use warnings; my $logfile="log.txt"; my $error="DOWN"; my $warn="PROBLEM"; my $redbutton="\<img src\=\'default_files/perlredblink\.gif'>"; my $greenbutton="\<img src\=\'default_files/perlgreenblink\.gif'>"; my $yellowbutton="\<img src\=\'default_files/perlyellowblink\.gif'>"; open LOG, $logfile or die "Cannot open $logfile for read :$!"; my $button = $greenbutton; while ($_ = <LOG>) { if ($_ =~ /$error/i) { $button = $redbutton; print "<!--Content-type: text/html-->\n\n"; print "$button"; last; } elsif ($_ =~ /$warn/i) { $button = $yellowbutton; print "<!--Content-type: text/html-->\n\n"; print "$button"; last; } else { print "<!--Content-type: text/html-->\n\n"; print "$button"; last; } } close LOG;
      Unfortunately, it does everything I want except go through the log line by line. How can I safely get this program to do that? Can you explain the alternative to using an array and/or a safe way to do this using an array? Thanks!
Efficient file handling (was Re^3: trouble parsing log file...)
by jarich (Curate) on Nov 23, 2006 at 14:59 UTC

    I thought I'd reply rather than --ing your post just because I disagreed.

    I cannot think of any meaning of the phrase "more efficient" which would render your statement correct.

    All the reading I've ever done on the matter says that parsing a file line by line is extremely efficient. What happens is as follows. The operating system reads a chunk of the file into memory; this is then broken up on newlines (or whatever the value of $/ is); then we iterate over each line until we run out and the process repeats. We can parse a file line by line as follows:

    while ( <FILE> )

    If we choose to stop reading the file at any point (perhaps we've found what we want) and call last, then we end up only reading the smallest part of the file as necessary. This means it's efficient time-wise, and because we're only holding one chunk of file in memory at a time, it's efficient memory-wise.

    Alternately, my reading has said that "dumping the file to an array" and parsing it line by line is very inefficient. This is the case whether we do this like this:

    my @logarray = <FILE>; foreach my $element (@logarray)

    or like this:

    foreach my $element (<FILE>)

    This is because the file system still gives Perl the file on a chunk by chunk basis, and Perl still splits it up on $/, but Perl has to do this for the whole file even if we're only going to look at the first 10 lines. Worse, Perl now has to store the entire file in memory, rather than just a chunk. So this is the least efficient way to handle a file in Perl.

    It is however very useful when we need random access to the whole file; for example when sorting it, or pulling out random quotes.

    I'd love to hear why, if you think I'm mistaken in my understanding in this matter.

      Perl did some old tricks that reached a little bit too far inside the <stdio.h> macros to be completely portable but that allowed Perl line-at-a-time I/O to be about twice as fast as C line-at-a-time I/O... on sufficiently "standard enough" systems. That was back in the days of AT&T Unix, before Linux. Last time I checked (long enough ago that I hope things have improved but not long enough ago that I've heard that they have), Perl still did line-at-a-time I/O unnecessarilly inefficiently when compiled on a system that isn't "standard enough" (which is nearly every system these days).

      This meant that Perl line-at-a-time I/O was 4 times slower than it really should be on Linux (for example). This actually made re-implementing line-at-a-time I/O in Perl code faster than using Perl's own line-at-a-time I/O implemented in C code (about twice as fast, which means that when Perl gets fixed, it would be about twice as slow, which would be expected).

      Yes, it makes little sense for Perl code to be faster than Perl's own C code. Unfortunately, that was certainly the case not too long ago.

      The command perl -V:d_stdstdio will tell you whether Perl thinks your platform is "standard enough".

      But, yes, the speed difference between line-at-a-time I/O and "slurping" is usually small enough not to matter (even considering Perl's quirk here). The memory consumption difference can be hugely significant, of course.

      - tye        

      Hi, I tried to do this and couldn't get it to work correctly, can you show me what I'm doing wrong?
      use strict; use warnings; my $logfile="log.txt"; my $error="DOWN"; my $warn="PROBLEM"; my $redbutton="\<img src\=\'default_files/perlredblink\.gif'>"; my $greenbutton="\<img src\=\'default_files/perlgreenblink\.gif'>"; my $yellowbutton="\<img src\=\'default_files/perlyellowblink\.gif'>"; open LOG, $logfile or die "Cannot open $logfile for read :$!"; my $button = $greenbutton; my @logfile=<LOG>; # throw logfile into an array while (<LOG>) { if ($_ =~ /$error/i) { $button = $redbutton; print "<!--Content-type: text/html-->\n\n"; print "$button"; last; } elsif ($_ =~ /$warn/i) { $button = $yellowbutton; print "<!--Content-type: text/html-->\n\n"; print "$button"; last; } else { print "<!--Content-type: text/html-->\n\n"; print "$button"; last; } } close LOG;

        First and foremost, your line:

        my @logfile=<LOG>;    # throw logfile into an array

        does exactly what the comment says: it reads the whole logfile into an array (which you subsequently never use). That means that the test in the following line:

        while ( <LOG> ) {

        can never be true: the filehandle has already reached the end of the file by the time this line is reached. Just get rid of the first of these two lines.

        However, once this problem is fixed, it becomes clear that your program logic is flawed. The program will only ever read just one line from the logfile, because all the branches of the if/elsif/else structure end with last. So if the first line of the logfile contains, say, just "foo", it will print a green button and stop executing, even if the second (or third...) line is "SERVER DOWN".

        Lastly, as your regexes stand, $error will match eg "Downing Street", which is probably not what you want.

        I suggest that you use something like the following (simplified for the purposes of this posting to read from __DATA__ rather from a filehandle, and to output a simple string):

        use strict; use warnings; my $error = 'DOWN'; my $warn = 'PROBLEM'; my $redbutton = 'RED BUTTON'; my $greenbutton = 'GREEN BUTTON'; my $yellowbutton = 'YELLOW BUTTON'; my $button = $greenbutton; while ( <DATA> ) { if ( /\b$error\b/i ) { $button = $redbutton; last; } elsif ( /\b$warn\b/i ) { $button = $yellowbutton; } } print $button; __DATA__ foo tony.blair@downingstreet.gov.uk Watership Down bar

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://585122]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others examining the Monastery: (2)
As of 2024-04-19 19:48 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found