Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

How can I grab info from a log file including and after a keyword 'Mozilla' is recognized. Here is a sample line of from the log data:

ip here - - 02/Jan/2001:00:09:30 +0000 "POST /path HTTP/1.1" 200 132 Mozilla/4.0 (compatible; MSIE 5.01; Windows 98)

I'm trying to grab 'Mozilla' and everything after it, (but nothing before), count the characters in the captured data and report how many are over 100 char.

I've tried a variation of pattern matching and grep but continually fail. (i know...i really suck but i'm trying really hard ot learn this stuff) Here's what I have drooled out so far.

#!/usr/bin/perl if ($_[0]) { $filename=$_[0]; } else { print "USAGE: getUserAgent <logfile name>\n"; exit; } @lines=`cat /path/$filename`; foreach $line (@lines) { @elements=split(' ',$line); if ($elements[1] eq "Mozilla") { $capture = grep(/bMozilla/i\w+\W+\d+\s+\S+,userAgent); } printf "\n", @userAgent; }

Needless to say this errors out badly:

Backslash found where operator expected at getUserAgent.pl line 18, near "/\bMozilla/i\" (Missing operator before \?) syntax error at getUserAgent.pl line 18, near "/\bMozilla/i\" Substitution replacement not terminated at getUserAgent.pl line 18.

This is my first post. Can one of you awsome Monks please help?

edmay98

Replies are listed 'Best First'.
Re: grabbing info from log after key word
by I0 (Priest) on Jan 03, 2001 at 06:43 UTC
    #!/usr/bin/perl if( $ARGV[0] ){ $filename=$ARGV[0]; }else{ print "USAGE: getUserAgent <logfile name>\n"; exit; } open FILE,"</path/$filename" or die "Can't open /path/$filename becaus +e $!"; $over=0; while( <FILE> ){ if( m/\b(Mozilla\b.*)/ ){ print "$1\n"; $over++ if( (length $1) > 100 ); } } print "$over over 100\n";
      Thanks 10, you oh so honorable monk. I tried this and it worked great. May the new year bring you happiness, joy and plentiful amounts of excellent grog.
Re: grabbing info from log after key word
by chromatic (Archbishop) on Jan 03, 2001 at 08:01 UTC
    The /i after Mozilla is interpreted as the terminating slash of the regex.

    I might grep through the lines, looking only for 'Mozilla', splitting the results on Mozilla, and taking the length of the second element after the split. If you have many lines to process, this will take lots of memory, though.

    You could do something more like this:

    open(INPUT, "/path/$filename") or die "Can't open: $!"; while (<INPUT>) { if (/Mozilla/) { my $rest = (split(/Mozilla/, $_, 2))[1]; if (length($rest) > 100) { # tag this line somehow } } }
    That's untested and rather generic, but it's fairly close.
Re: grabbing info from log after key word
by turnstep (Parson) on Jan 04, 2001 at 00:28 UTC

    I'd throw the answers in a hash: for a typical logfile, you'll get a lot of the same answers anyway:

    open(LOGFILE, "$mylogfile") or die "Could not open $mylogfile: $!\n"; my %useragent; while(<LOGFILE>) { if (/Mozilla(.*)$/) { $useragent{$1}++; } } close(LOGFILE); ## Alphabetically sorted: for (sort keys %useragent) { printf "Length: %3d Frequency: %5d Name: $_\n", length $_, $useragen +t{$_}; } ## Sorted by frequency: for (sort {$useragent{$a} <=> $useragent{$b}} keys %useragent) { printf "Length: %3d Frequency: %5d Name: $_\n", length $_, $useragen +t{$_}; } ## Sorted by length: for (map { $_->[0] } sort { $a->[1] <=> $b->[1] } map { [ $_, length $_ ] } keys %useragent) { printf "Length: %3d Frequency: %5d Name: $_\n", length $_, $useragen +t{$_}; }
Re: grabbing info from log after key word
by EvanK (Chaplain) on Jan 04, 2001 at 00:18 UTC
    try using a matching expression like:
    m/Mozilla(.*)/;
    then, assign the $capture variable to "Mozilla$1" ______________________________________________
    It's hard to believe that everyone here is the result of the smartest sperm.