Re^2: trouble parsing log file...

Replies are listed 'Best First'.
Re^3: trouble parsing log file... by inman (Curate) on Nov 20, 2006 at 20:02 UTC
Dumping a multi Gigabyte log file into an array is going to get ugly quickly. A combination of the Perl internals and the IO buffering on the computer should take care of this situation line by line.	[reply]
Re^4: trouble parsing log file... by perl_geoff (Acolyte) on Nov 22, 2006 at 14:12 UTC
Here's what I have: use strict; use warnings; my $logfile="log.txt"; my $error="DOWN"; my $warn="PROBLEM"; my $redbutton="\<img src\=\'default_files/perlredblink\.gif'>"; my $greenbutton="\<img src\=\'default_files/perlgreenblink\.gif'>"; my $yellowbutton="\<img src\=\'default_files/perlyellowblink\.gif'>"; open LOG, $logfile or die "Cannot open $logfile for read :$!"; my $button = $greenbutton; while ($_ = <LOG>) { if ($_ =~ /$error/i) { $button = $redbutton; print "<!--Content-type: text/html-->\n\n"; print "$button"; last; } elsif ($_ =~ /$warn/i) { $button = $yellowbutton; print "<!--Content-type: text/html-->\n\n"; print "$button"; last; } else { print "<!--Content-type: text/html-->\n\n"; print "$button"; last; } } close LOG; [download] Unfortunately, it does everything I want except go through the log line by line. How can I safely get this program to do that? Can you explain the alternative to using an array and/or a safe way to do this using an array? Thanks!	[reply] [d/l]
Efficient file handling (was Re^3: trouble parsing log file...) by jarich (Curate) on Nov 23, 2006 at 14:59 UTC
I thought I'd reply rather than --ing your post just because I disagreed. I cannot think of any meaning of the phrase "more efficient" which would render your statement correct. All the reading I've ever done on the matter says that parsing a file line by line is extremely efficient. What happens is as follows. The operating system reads a chunk of the file into memory; this is then broken up on newlines (or whatever the value of $/ is); then we iterate over each line until we run out and the process repeats. We can parse a file line by line as follows: `while ( <FILE> )` [download] If we choose to stop reading the file at any point (perhaps we've found what we want) and call last, then we end up only reading the smallest part of the file as necessary. This means it's efficient time-wise, and because we're only holding one chunk of file in memory at a time, it's efficient memory-wise. Alternately, my reading has said that "dumping the file to an array" and parsing it line by line is very inefficient. This is the case whether we do this like this: `my @logarray = <FILE>; foreach my $element (@logarray)` [download] or like this: `foreach my $element (<FILE>)` [download] This is because the file system still gives Perl the file on a chunk by chunk basis, and Perl still splits it up on $/, but Perl has to do this for the whole file even if we're only going to look at the first 10 lines. Worse, Perl now has to store the entire file in memory, rather than just a chunk. So this is the least efficient way to handle a file in Perl. It is however very useful when we need random access to the whole file; for example when sorting it, or pulling out random quotes. I'd love to hear why, if you think I'm mistaken in my understanding in this matter.	[reply] [d/l] [select]
Re: Efficient file handling (was Re^3: trouble parsing log file...) (stranger than fiction) by tye (Sage) on Nov 23, 2006 at 15:50 UTC
Perl did some old tricks that reached a little bit too far inside the `<stdio.h>` macros to be completely portable but that allowed Perl line-at-a-time I/O to be about twice as fast as C line-at-a-time I/O... on sufficiently "standard enough" systems. That was back in the days of AT&T Unix, before Linux. Last time I checked (long enough ago that I hope things have improved but not long enough ago that I've heard that they have), Perl still did line-at-a-time I/O unnecessarilly inefficiently when compiled on a system that isn't "standard enough" (which is nearly every system these days). This meant that Perl line-at-a-time I/O was 4 times slower than it really should be on Linux (for example). This actually made re-implementing line-at-a-time I/O in Perl code faster than using Perl's own line-at-a-time I/O implemented in C code (about twice as fast, which means that when Perl gets fixed, it would be about twice as slow, which would be expected). Yes, it makes little sense for Perl code to be faster than Perl's own C code. Unfortunately, that was certainly the case not too long ago. The command `perl -V:d_stdstdio` will tell you whether Perl thinks your platform is "standard enough". But, yes, the speed difference between line-at-a-time I/O and "slurping" is usually small enough not to matter (even considering Perl's quirk here). The memory consumption difference can be hugely significant, of course. - tye	[reply] [d/l] [select]
Re: Efficient file handling (was Re^3: trouble parsing log file...) by perl_geoff (Acolyte) on Nov 24, 2006 at 18:01 UTC
Hi, I tried to do this and couldn't get it to work correctly, can you show me what I'm doing wrong? use strict; use warnings; my $logfile="log.txt"; my $error="DOWN"; my $warn="PROBLEM"; my $redbutton="\<img src\=\'default_files/perlredblink\.gif'>"; my $greenbutton="\<img src\=\'default_files/perlgreenblink\.gif'>"; my $yellowbutton="\<img src\=\'default_files/perlyellowblink\.gif'>"; open LOG, $logfile or die "Cannot open $logfile for read :$!"; my $button = $greenbutton; my @logfile=<LOG>; # throw logfile into an array while (<LOG>) { if ($_ =~ /$error/i) { $button = $redbutton; print "<!--Content-type: text/html-->\n\n"; print "$button"; last; } elsif ($_ =~ /$warn/i) { $button = $yellowbutton; print "<!--Content-type: text/html-->\n\n"; print "$button"; last; } else { print "<!--Content-type: text/html-->\n\n"; print "$button"; last; } } close LOG; [download]	[reply] [d/l]
Re^2: Efficient file handling (was Re^3: trouble parsing log file...) by Not_a_Number (Prior) on Nov 24, 2006 at 18:47 UTC
First and foremost, your line: `my @logfile=<LOG>; # throw logfile into an array` does exactly what the comment says: it reads the whole logfile into an array (which you subsequently never use). That means that the test in the following line: `while ( <LOG> ) {` can never be true: the filehandle has already reached the end of the file by the time this line is reached. Just get rid of the first of these two lines. However, once this problem is fixed, it becomes clear that your program logic is flawed. The program will only ever read just one line from the logfile, because all the branches of the if/elsif/else structure end with `last`. So if the first line of the logfile contains, say, just "foo", it will print a green button and stop executing, even if the second (or third...) line is "SERVER DOWN". Lastly, as your regexes stand, `$error` will match eg "Downing Street", which is probably not what you want. I suggest that you use something like the following (simplified for the purposes of this posting to read from `__DATA__` rather from a filehandle, and to output a simple string): `use strict; use warnings; my $error = 'DOWN'; my $warn = 'PROBLEM'; my $redbutton = 'RED BUTTON'; my $greenbutton = 'GREEN BUTTON'; my $yellowbutton = 'YELLOW BUTTON'; my $button = $greenbutton; while ( <DATA> ) { if ( /\b$error\b/i ) { $button = $redbutton; last; } elsif ( /\b$warn\b/i ) { $button = $yellowbutton; } } print $button; __DATA__ foo tony.blair@downingstreet.gov.uk Watership Down bar` [download]	[reply] [d/l] [select]
Re^3: Efficient file handling (was Re^3: trouble parsing log file...) by Anonymous Monk on Jul 05, 2009 at 03:09 UTC


Do you know where your variables are?
	PerlMonks