in reply to Re^2: DateTime::Format::Flexible; for Log Parse with multiple formatted lines
in thread DateTime::Format::Flexible; for Log Parse with multiple formatted lines

Two problems I see with that code are: First, just like in the original code, while ($str =~ /.../g) without a \G regex anchor ($str =~ /\G.../g) will skip over stuff in $str that doesn't match the regex, possibly resulting in missed data. Second, as 1nickt already said, $0 is not a regex capture (see $0), and the regex only has three capture groups, so $4 and above will never be populated by that regex.

Based on your regex, it looks like you're trying to break up the string based on whitespace, in which case a simple my @parts = split ' ', $rest; might be easiest.

However, I see that your log entries have quoted strings, so that might not be appropriate either. Your first couple of example log entries could possibly be broken apart like this: my @parts = split /\s*[\[\]]\s*/, $rest, 5;, or, you'll have to write regexes that actually match the log entries, e.g. /^ \s* (\d+) \s* \[(\d+)\] \s+ (\S+) \s+ \[(.+?)\] \s* (\w+): \s* (.*?) \s* $/x, for example.

To match quoted strings, you could use Regexp::Common::delimited or the core module Text::Balanced. Good resources on regexes in general are perlretut, perlrequick, and perlre.

Replies are listed 'Best First'.
Re^4: DateTime::Format::Flexible; for Log Parse with multiple formatted lines
by TCLion (Novice) on Mar 27, 2017 at 17:08 UTC

    Please remember I am new to Perl. I am trying to understand your code examples but have not been able to get it to work in any way. I did go through and looked up the code to understand it but I am missing somthing. I dont want every white space to seperate just the first few then the error message. One of the lines doesnt have a seperate severity like INFORMATION but an info: in the line. So the spot for info: I would be able to pull out error: if it was an error. I am still trying to use $1 $2 $3 to understand and find the positions to place where I want them but unsuccessful. I do understand the higher numbers would pull nothing but are there because I am trying to break it up and see what is there as I go to see if I missed one. Also $0 printed the file location which is a good separator when going through the results.

    Ok maybe this will help explain what I am trying to do. Here is the data and the desired output for both line formats leaving out unnecessary info.

    __data__ Mon Feb 20 09:31:25 2017 [INFORMATION] [AGENTEXEC 26816] Detected duri +ng program initialization: Version: 7.2.0160.0297 Win64 2017-02-20T09:30:53.177000 20848[30892] 0000000000000000 [DM_ +MQ_I_DAEMON_START]info: "Message queue daemon (tid : 27944, session +0102b20d80000456) is started sucessfully."
    Server Name 2017-02-20 09:30:53 info: DM_MQ_I_DAEMON_START info: "Message queue daemon (tid : 27944, session 0102b20d80000456) is started sucessfully."
    Server Name 2017-02-20 09:31:25 INFORMATION Detected during program initialization: Version: 7.2.0160.0297 Win64

    I do appreciate your time for helping and explaining this to me.

      Please remember I am new to Perl. I am trying to understand your code examples but have not been able to get it to work in any way.

      Sure, I understand, we all start somewhere :-) Sorry if I threw too many new concepts out there at once. But I also ask you to please understand that neither I nor PerlMonks are a free code writing service - I posted that working first script because I wanted to get you started with code that shows some things I'd consider best practices. Beyond that, monks will usually expect to see some efforts to learn and write code, for example you could try writing some more regexes and showing us where they are going wrong. Also, note that when you say things similar to "it didn't work", that doesn't give us enough information to help you debug - see How do I post a question effectively? and Short, Self-Contained, Correct Example.

      If you're still working on getting the hang of regexes, then I recommend you get started with perlretut and perlrequick. Also, when working on regexes, it's best to use a variation of test inputs to use as test cases. Here, I will show you one way to test your regexes as you are working on them using Test::More (see Quote and Quote like Operators for information on q{} - in short, it's like single quotes). I hope it won't be too difficult to adapt for your testing. Note that the elements of the @out array correspond to $1, $2, .... Also, sites like https://regex101.com/ can help (here's something to get you started), although some of the more advanced regex syntax is not compatible with Perl.

      use warnings; use strict; use Test::More; # this is the regex we're working on my $regex = qr/^ \s* (\d+) \s* \[(\d+)\] \s+ (\S+) \s+ \[(.+?)\] \s* ( +\w+): \s* (.*?) \s* $/x; { ok my @out = # inside the q{} is the test input string q{ 20848[30892] 0000000000000000 [DM_MQ_I_DAEMON_START]info: + "Message queue daemon (tid : 27944, session 0102b20d80000456) is st +arted sucessfully." } =~ $regex; is_deeply \@out, # inside the [] is the expected output (capture group matches) [ '20848', '30892', '0000000000000000', 'DM_MQ_I_DAEMON_START', 'i +nfo', q{"Message queue daemon (tid : 27944, session 0102b20d80000456) is + started sucessfully."} ] or diag explain \@out; } # ... add more test cases here! done_testing;

      As 1nickt suggested, you might find it easier to break the problem down into steps rather than trying to do it in one go. First separate the date, then extract the message from the remainder. For example

      #!/usr/bin/perl use strict; # month numbers my %mno = (jan=>1,feb=>2,mar=>3,apr=>4, may=>5, jun=>6, jul=>7,aug=>8,sep=>9,oct=>10,nov=>11,dec=>12); # define date formats my $df1 = qr[... ... \d{2} \d{2}:\d{2}:\d{2} 20\d\d]; my $df2 = qr[20\d{2}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}\.\d{6}]; # process log while (<DATA>){ chomp; if (/^($df1|$df2)\s+(.*)/){ my ($date,$time) = format_date($1); my $msg = $2; my $severity; my $err; if ($msg =~ /(\S*info:.*)/){ $severity = 'INFO'; $err = $1; } if ($msg =~ /(\[INFORMATION\])\s+(.*)/){ $severity = 'INFO'; $err = $2; $err =~ s/^\[.*\] +//; # remove [AGENTEXEC 26816] } print "$date\t$time\t$severity\t$err\n"; } } # Mon Feb 20 09:31:25 2017 sub format_date { my $date = shift; my $time; if ($date =~/(...) (\d{2}) (\d{2}:\d{2}:\d{2}) (20\d\d)/){ $date = sprintf "%4d-%02d-%02d",$4,$mno{lc $1},$2; $time = $3; } if ($date =~/(.+)T(\d{2}:\d{2}:\d{2})/){ $date = $1; $time = $2; } return ($date,$time); } __DATA__ Mon Feb 20 09:31:25 2017 [INFORMATION] [AGENTEXEC 26816] Detected duri +ng program initialization: Version: 7.2.0160.0297 Win64 2017-02-20T09:30:53.177000 20848[30892] 0000000000000000 [DM_ +MQ_I_DAEMON_START]info: "Message queue daemon (tid : 27944, session +0102b20d80000456) is started sucessfully."
      poj

      Hi TCLion,

      Please remember I am new to Perl. I am trying to understand your code examples but have not been able to get it to work in any way.

      ... then simplify. You are attempting solve a non-trivial problem with lots of elements. Break it down further until you can understand and solve the parts, then reassemble.

      There's an adrenalin rush one gets when beginning programming, and you want to keep feeling that. Or maybe there's real pressure from a deadline at work or elsewhere. But even so you must pause to breathe, try to see the bigger perspective on your problem, and read, test, read, test, read, test. It's hard to do that with your "production" code. Step away from it and work up some prototypes, divide the task into chunks.

      For example, in your position, I would consider writing a script that **only** determines which of the two possible classes of pattern each line falls into, and prints each line to the appropriate file. Note, a file, not a hashref or any data structure. Write a script that starts and ends and can be "forgotten about" as you move on. Then the next one can open the resulting files, and you can start by focussing only on one class of pattern match without the conditional flow. Etc.

      ( FWIW, I still employ that practise because I think it's a best practise. Last week I coded a simple tool to fetch database backup files from a cloud repo, store them locally after uncompressing, and then load the interesting SQL inserts into a destructible working DB. While I could easily have done it all in one script I chose to have three, because simplicity, but specifically, because I wanted to be able to execute discrete portions of the workflow as simply as possible (ie run a script with no args). )

      Hope this helps!


      The way forward always starts with a minimal test.