Re^3: DateTime::Format::Flexible; for Log Parse with multiple formatted lines

Replies are listed 'Best First'.

Re^4: DateTime::Format::Flexible; for Log Parse with multiple formatted lines
by TCLion (Novice) on Mar 27, 2017 at 17:08 UTC

Please remember I am new to Perl. I am trying to understand your code examples but have not been able to get it to work in any way. I did go through and looked up the code to understand it but I am missing somthing. I dont want every white space to seperate just the first few then the error message. One of the lines doesnt have a seperate severity like INFORMATION but an info: in the line. So the spot for info: I would be able to pull out error: if it was an error. I am still trying to use $1 $2 $3 to understand and find the positions to place where I want them but unsuccessful. I do understand the higher numbers would pull nothing but are there because I am trying to break it up and see what is there as I go to see if I missed one. Also $0 printed the file location which is a good separator when going through the results.

Ok maybe this will help explain what I am trying to do. Here is the data and the desired output for both line formats leaving out unnecessary info.

__data__
Mon Feb 20 09:31:25 2017 [INFORMATION] [AGENTEXEC 26816] Detected duri
+ng program initialization: Version: 7.2.0160.0297  Win64
2017-02-20T09:30:53.177000    20848[30892]    0000000000000000    [DM_
+MQ_I_DAEMON_START]info:  "Message queue daemon (tid : 27944, session 
+0102b20d80000456) is started sucessfully."
[download]

Server Name	2017-02-20	09:30:53	info:	DM_MQ_I_DAEMON_START info: "Message queue daemon (tid : 27944, session 0102b20d80000456) is started sucessfully."
Server Name	2017-02-20	09:31:25	INFORMATION	Detected during program initialization: Version: 7.2.0160.0297 Win64

I do appreciate your time for helping and explaining this to me.

[reply]
[d/l]
[select]

Re^5: DateTime::Format::Flexible; for Log Parse with multiple formatted lines

by haukex (Archbishop) on Mar 27, 2017 at 18:07 UTC

Please remember I am new to Perl. I am trying to understand your code examples but have not been able to get it to work in any way.

Sure, I understand, we all start somewhere :-) Sorry if I threw too many new concepts out there at once. But I also ask you to please understand that neither I nor PerlMonks are a free code writing service - I posted that working first script because I wanted to get you started with code that shows some things I'd consider best practices. Beyond that, monks will usually expect to see some efforts to learn and write code, for example you could try writing some more regexes and showing us where they are going wrong. Also, note that when you say things similar to "it didn't work", that doesn't give us enough information to help you debug - see How do I post a question effectively? and Short, Self-Contained, Correct Example.

If you're still working on getting the hang of regexes, then I recommend you get started with perlretut and perlrequick. Also, when working on regexes, it's best to use a variation of test inputs to use as test cases. Here, I will show you one way to test your regexes as you are working on them using Test::More (see Quote and Quote like Operators for information on q{} - in short, it's like single quotes). I hope it won't be too difficult to adapt for your testing. Note that the elements of the @out array correspond to $1, $2, .... Also, sites like https://regex101.com/ can help (here's something to get you started), although some of the more advanced regex syntax is not compatible with Perl.

use warnings;
use strict;
use Test::More;

# this is the regex we're working on
my $regex = qr/^ \s* (\d+) \s* \[(\d+)\] \s+ (\S+) \s+ \[(.+?)\] \s* (
+\w+): \s* (.*?) \s* $/x;

{
  ok my @out =
    # inside the q{} is the test input string
    q{ 20848[30892]    0000000000000000    [DM_MQ_I_DAEMON_START]info:
+  "Message queue daemon (tid : 27944, session 0102b20d80000456) is st
+arted sucessfully." }
    =~ $regex;
  is_deeply \@out,
    # inside the [] is the expected output (capture group matches)
    [ '20848', '30892', '0000000000000000', 'DM_MQ_I_DAEMON_START', 'i
+nfo',
    q{"Message queue daemon (tid : 27944, session 0102b20d80000456) is
+ started sucessfully."} ]
    or diag explain \@out;
}
# ... add more test cases here!

done_testing;
[download]

[reply]
[d/l]
[select]

Re^5: DateTime::Format::Flexible; for Log Parse with multiple formatted lines

by poj (Abbot) on Mar 27, 2017 at 20:18 UTC

As 1nickt suggested, you might find it easier to break the problem down into steps rather than trying to do it in one go. First separate the date, then extract the message from the remainder. For example

#!/usr/bin/perl
use strict;

# month numbers
my %mno = (jan=>1,feb=>2,mar=>3,apr=>4, may=>5, jun=>6, 
           jul=>7,aug=>8,sep=>9,oct=>10,nov=>11,dec=>12);

# define date formats
my $df1 = qr[... ... \d{2} \d{2}:\d{2}:\d{2} 20\d\d];
my $df2 = qr[20\d{2}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}\.\d{6}];

# process log
while (<DATA>){
  chomp;
  if (/^($df1|$df2)\s+(.*)/){
    my ($date,$time) = format_date($1);
    my $msg  = $2;
    my $severity;
    my $err;
    
    if ($msg =~ /(\S*info:.*)/){
      $severity = 'INFO';
      $err = $1;
    }
    
    if ($msg =~ /(\[INFORMATION\])\s+(.*)/){
      $severity = 'INFO';
      $err = $2;
      $err =~ s/^\[.*\] +//; # remove [AGENTEXEC 26816]
    }
    
    print "$date\t$time\t$severity\t$err\n";
  }
}

# Mon Feb 20 09:31:25 2017
sub format_date {
  my $date = shift;
  my $time;
  if ($date =~/(...) (\d{2}) (\d{2}:\d{2}:\d{2}) (20\d\d)/){
    $date  = sprintf "%4d-%02d-%02d",$4,$mno{lc $1},$2;
    $time = $3;
  }
   if ($date =~/(.+)T(\d{2}:\d{2}:\d{2})/){
    $date  = $1;
    $time  = $2;
  } 
  return ($date,$time);
}

__DATA__
Mon Feb 20 09:31:25 2017 [INFORMATION] [AGENTEXEC 26816] Detected duri
+ng program initialization: Version: 7.2.0160.0297  Win64
2017-02-20T09:30:53.177000    20848[30892]    0000000000000000    [DM_
+MQ_I_DAEMON_START]info:  "Message queue daemon (tid : 27944, session 
+0102b20d80000456) is started sucessfully."
[download]

[reply]
[d/l]

Re^5: DateTime::Format::Flexible; for Log Parse with multiple formatted lines

by 1nickt (Canon) on Mar 27, 2017 at 17:35 UTC

Hi TCLion,

Please remember I am new to Perl. I am trying to understand your code examples but have not been able to get it to work in any way.

... then simplify. You are attempting solve a non-trivial problem with lots of elements. Break it down further until you can understand and solve the parts, then reassemble.

There's an adrenalin rush one gets when beginning programming, and you want to keep feeling that. Or maybe there's real pressure from a deadline at work or elsewhere. But even so you must pause to breathe, try to see the bigger perspective on your problem, and read, test, read, test, read, test. It's hard to do that with your "production" code. Step away from it and work up some prototypes, divide the task into chunks.

For example, in your position, I would consider writing a script that **only** determines which of the two possible classes of pattern each line falls into, and prints each line to the appropriate file. Note, a file, not a hashref or any data structure. Write a script that starts and ends and can be "forgotten about" as you move on. Then the next one can open the resulting files, and you can start by focussing only on one class of pattern match without the conditional flow. Etc.

( FWIW, I still employ that practise because I think it's a best practise. Last week I coded a simple tool to fetch database backup files from a cloud repo, store them locally after uncompressing, and then load the interesting SQL inserts into a destructible working DB. While I could easily have done it all in one script I chose to have three, because simplicity, but specifically, because I wanted to be able to execute discrete portions of the workflow as simply as possible (ie run a script with no args). )

Hope this helps!

The way forward always starts with a minimal test.

[reply]