justinkala has asked for the wisdom of the Perl Monks concerning the following question:

This is the message file that I need to be parse.
<#|2014 Jul 29 16:20:20|INFO|JAVA_TEST_1.0.0|sun.java.jsf.managedbean +.AuthenticationMgr|DEFAULT|Login successful for user 'usr0001'|APPLIC +ATION USER|#> <#|2014 Jul 29 16:26:08|INFO|JAVA_TEST_1.0.0|sun.java.jsf.managedbean. +AuthenticationMgr|DEFAULT|Login successful for user 'usr0005'|APPLICA +TION USER|#> <#|2014 Jul 28 16:20:55|INFO|JAVA_TEST_1.0.0|sun.java.jsf.managedbean +.AuthenticationMgr|DEFAULT|Login successful for user 'usr0006'|APPLIC +ATION USER|#> <#|2014 Jul 28 16:22:44|INFO|JAVA_TEST_1.0.0|sun.java.jsf.managedbean. +user.UserRoleMgr|DEFAULT|Assigned roles for user 'usr0002' were modif +ied by user 'usr0006'|APPLICATION USER|#>
I have this configuration file where i put the search words so that i can decide what kind of event it is. cat EventType.conf
Application Error::Error Succesful Authorization::User Logged Succesfully Failed Authorization::User logon failed/unsuccesful
This is the perl script which reads the message file and checks with the Configuration file and returns the event name such as Succesful Authorization.
/usr/bin/perl /dir/perl/test.pl ${infile} ${outfile} cat test.pl #!/usr/bin/perl $dir="/dir"; $infile = $ARGV[0]; $outfile = $ARGV[1]; $configfile="$dir/conf/EventType.conf"; open(FILE, $infile) or die("Could not open $infile."); $/ = "#>\n"; $\ = "\n"; open(OUTFILE, ">", $outfile) or die("Could not open $outfile."); for $line (<FILE>) { # chomp($line); #split each line into fields and process @Line = split (/\|/, $line); #Check ETYPE and change EOUTCOME if ($Line[2] eq 'INFO') { $Line[5] = "INFO"; } elsif ($Line[2] eq 'ERROR') { $Line[5] = "ERROR"; } #Check EMSG and create new field next to it open CONFIG, $configfile or die "Could not open $configfile... + $!"; for $configLine (<CONFIG>) { chomp($configLine); @configLineItems = split /::/, $configLine; for $checkItem (@configLineItems) { if ("$Line[6]" =~ $checkItem) { $Line[8] = $configLineItems[0]; } else { $Line[8] = "Other Application Event" +; } } } #Write output print OUTFILE "|", $Line[1], "|", $Line[2], "|", $Line[3], "|" +, $Line[4], "|", $Line[5], "|", $Line[6], "|", $Line[8], "|", $Line[7 +],"|" ; close CONFIG; } close (FILE); close (OUTFILE);
Help needed on how efficiently I can use the Regex or grep in perl so that i can extract those values .and It is not one search word that is compared with the message ,it can be multiple words. "Login, succesful" are sought against the value being passed in $Line6 which is a event message. Thanks

Replies are listed 'Best First'.
Re: Regex perl grep usage string match comparison
by Anonymous Monk on Aug 05, 2014 at 23:33 UTC
    1. Always use warnings; use strict; !
    2. This is not such a big issue but it's considered better and safer to use the 3-arg open and lexical filehandles, e.g. open my $fh, '<', $infile or die "Could not open $infile: $!";
    3. Problem: This: $/ = "#>\n"; sets the input record separator globally from that point forward, including the point in the current code where you're reading the configuration file. That means that currently, <CONFIG> will read the entire file at once. You should at least use local $/ = "#>\n"; in a new block and follow the next tip:
    4. Parsing the whole configuration file for every line of input is inefficient; instead read the configuration file once at the beginning of the script and store its values in a data structure such as an array of arrays for later use. This also helps avoid the previous problem of the input record separator.
    5. The configuration file format is unclear and the loop for $checkItem (@configLineItems) doesn't make much sense to me. This is probably also because your sample configuration file "patterns" don't match any of the messages from the sample log file. I'm going to wager a guess that the format of the configuration lines is: "Message::Pattern::Pattern" etc. In that case I would suggest that you shift the first item off the array before the aforementioned loop, as in my $message = shift @configLineItems; and then use $message instead of $configLineItems[0].
    6. Instead of "$Line[6]" =~ $checkItem, this is better: $Line[6] =~ /\Q$checkItem/ (however if $checkItem contains actual regular expressions, drop the \Q) - see perlre for more information. The anchors ^ and $ as well as the modifier /i may be of interest to you.

    Please try these suggestions, and if you have further questions, please provide the updated code and better sample input (both logfile and configuration file), as well as the desired output.

      Thank you very much for the suggestions. I am new to Perl.Can you provide me the code (if you can)..from the files I provided??that will be great and helpful..
        Did you read the post you replied to? It says the specifications are unclear, that needs some clarification from you. Also, PerlMonks is here to help people learn Perl. To get help, please show some effort (try to apply the suggestions you've been given, write some code, perlintro) - or you can pay someone to write the code for you (not here).
Re: Regex perl grep usage string match comparison
by Laurent_R (Canon) on Aug 05, 2014 at 21:27 UTC
    Read your config file only once at the beginning and store the content in memory (a hash or an array), before opening and processing the main input file. Don't read the config file for every line of input, this is very likely to be very inefficient.
Re: Regex perl grep usage string match comparison
by Anonymous Monk on Aug 06, 2014 at 18:32 UTC

    In addition to the previous suggestions, you can eliminate the loop over @configLineItems by compiling a regular expression with qr like this:

    my $regex = join '|', map {quotemeta} @configLineItems; $regex = qr/$regex/;

    (If @configLineItems contains regular expressions, then drop the "map {quotemeta}".)

    This is something you can do when you read the configuration file before the loop over the input file, storing $regex for use during the main loop over the input. You can then use it like so: if ($Line[6] =~ $storedRegex)

    As was said earlier, as much as we like to help, this isn't a code writing service - please try to implement these suggestions and if you need help doing so please don't hesitate to ask.

      Implemented the code and it works but have some more things to be added
      foreach $inputline (@input_array) { my @inputline = split(/\|/, $inputline); $inputline[8] = "Other Application event"; my $lastColumn = "#>"; push @inputline, $lastColumn; #Check ETYPE and change EOUTCOME if ($inputline[2] eq 'INFO') { $inputline[5] = "INFO"; } elsif ($inputline[2] eq 'ERROR') { $inputline[5] = "ERROR"; } #Check EMSG and create new field next to it foreach $configline (@cfg_array) { my @configline_array = split(/\|/, $configline); shift @configline_array; for $configitem (@configline_array) { if($inputline[6] =~ $configitem) { my $lastColumn = pop @inputline; $inputline[8] = $configline_array[0]; push @inputline, $lastColumn; last; } } } $line = join('|', @inputline); print $output_fh $line, "\n"; }
      if($inputline6 =~ $configitem) { In this line which comparison operator shud be put so that if the string I am searching for is "exception" but the line in the file might contain "MessageException" or "Message.Exception" .So primarily to ignore CASE or it could be part of word. Any help on this??