Log Parsing

piyushmnnit06 has asked for the wisdom of the Perl Monks concerning the following question:

Below is my log structure

0317 09:53:14.865+0000 {12772} INFO  [pm-worker-exec slot-Task:id=8274
+,env=12772,type=11][c.s.w.t.f.s.PostExecutionStage  ] Loaded {child r
+unId vs completion type}: {8286-SUCCESSFUL}{8287-SUCCESSFUL}{8288-SUC
+CESSFUL}{8289-SUCCESSFUL}{8290-SUCCESSFUL}{8291-SUCCESSFUL}{8292-SUCC
+ESSFUL}{8293-SUCCESSFUL}{8294-SUCCESSFUL}{8295-SUCCESSFUL}{8296-SUCCE
+SSFUL}
0317 09:54:12.498+0000 {12772} INFO  [pm-worker-exec slot-Task:id=8273
+,env=12772,type=55][edProcessInputBatchKafkaProducer] Started produci
+ng records on topic 12772_8286_20170317_IN_0
0317 09:54:13.428+0000 {12772} INFO  [pm-worker-exec slot-Task:id=8273
+,env=12772,type=55][edProcessInputBatchKafkaProducer] Started produci
+ng records on topic 12772_8287_20170317_IN_0 
0317 09:55:13.027+0000 {12772} INFO  [pm-worker-exec slot-Task:id=8273
+,env=12772,type=55][edProcessInputBatchKafkaProducer] Done with produ
+cing records on topic 12772_8286_20170317_IN_0
0317 09:55:15.027+0000 {12772} INFO  [pm-worker-exec slot-Task:id=8273
+,env=12772,type=55][edProcessInputBatchKafkaProducer] Done with produ
+cing records on topic 12772_8287_20170317_IN_0
[download]

So in above log structure first I need to fetch all successful child runID(i.e 8286,8287....8296) for parent id 8274.Now for each runID i have to get started producing records time stamp and done with producing records time stamp

I tried below logic but thats messed up.Could you please help me regarding this logic

desired output is below

 
topic,start_time,Endtime
12772_8286_20170317_IN_0,09:54:12.498+0000,09:55:13.027+0000
12772_8287_20170317_IN_0,09:54:13.428+0000,09:55:15.027+0000
[download]

Below is my code

open (MYFILE, '>data.CSV');
$LOGFILE = "worker.log";
open(LOGFILE) or die("Could not open log file.");
$LOGFILE1 = "worker.log";
open(LOGFILE1) or die("Could not open log file.");

$taskid=8274;


foreach $line (<LOGFILE>) 
{
    chomp($line); 
    if ( $line =~ m/$taskid.*child runId vs completion type/)
    {
    
    @Task_id = $line =~ /\{(\d+)-/g;
    
    }    
}


foreach $line (<LOGFILE1>) 
{
    chomp($line); 
    foreach $childid(@Task_id)
    {
    if ( $line =~ m/Started producing records on topic (\d+)_($childid
+)_(\d+)_IN_(\d+)/)
      {
      $envId=$1;
      $timestamp=$3;
      $input=$4;
    my @fields = split / /, $line;
    $data1="$envId\_$childid\_$timestamp\_IN\_$input\,$fields[1]";
    $data2="$envId\_$childid\_$timestamp\_IN\_$input";
    push @data9,$data1 ;
    print "$data1\n";
      }
      
     if ( $line =~ m/Done with producing records on topic (\d+)_($chil
+did)_(\d+)_IN_(\d+)/)
      {
      $envId=$1;
      $timestamp=$3;
      $input=$4;
    my @fields = split / /, $line;
    $data3="$envId\_$childid\_$timestamp\_IN\_$input\,$fields[1]";
    $data4="$envId\_$childid\_$timestamp\_IN\_$input";
    push @data9,$data3 ;
    print "$data3\n";
      }
     
     }
          
}
foreach (@data9) {
 print MYFILE "$_\n";
}    
    
    
close LOGFILE;
close LOGFILE1;
[download]

Comment on Log Parsing Select or Download Code

Replies are listed 'Best First'.
Re: Log Parsing by tybalt89 (Monsignor) on Apr 02, 2017 at 20:17 UTC
It just takes a simple regex :) ( ducking ) #!/usr/bin/perl -l # http://perlmonks.org/?node_id=1186713 use strict; use warnings; $_ = do { local $/; <DATA> }; print "topic,start_time,Endtime"; print join ',', $3, $2, $4 while /\b(\d+)-SUCCESSFUL (?= (?:.\n) \d+\ (\S+).Started.\b(\d+_\1_\d+_IN_0) (?:.\n) \d+\ (\S+).Done.\b\3 )/gx; __DATA__ 0317 09:53:14.865+0000 {12772} INFO [pm-worker-exec slot-Task:id=8274 +,env=12772,type=11][c.s.w.t.f.s.PostExecutionStage ] Loaded {child r +unId vs completion type}: {8286-SUCCESSFUL}{8287-SUCCESSFUL}{8288-SUC +CESSFUL}{8289-SUCCESSFUL}{8290-SUCCESSFUL}{8291-SUCCESSFUL}{8292-SUCC +ESSFUL}{8293-SUCCESSFUL}{8294-SUCCESSFUL}{8295-SUCCESSFUL}{8296-SUCCE +SSFUL} 0317 09:54:12.498+0000 {12772} INFO [pm-worker-exec slot-Task:id=8273 +,env=12772,type=55][edProcessInputBatchKafkaProducer] Started produci +ng records on topic 12772_8286_20170317_IN_0 0317 09:54:13.428+0000 {12772} INFO [pm-worker-exec slot-Task:id=8273 +,env=12772,type=55][edProcessInputBatchKafkaProducer] Started produci +ng records on topic 12772_8287_20170317_IN_0 0317 09:55:13.027+0000 {12772} INFO [pm-worker-exec slot-Task:id=8273 +,env=12772,type=55][edProcessInputBatchKafkaProducer] Done with produ +cing records on topic 12772_8286_20170317_IN_0 0317 09:55:15.027+0000 {12772} INFO [pm-worker-exec slot-Task:id=8273 +,env=12772,type=55][edProcessInputBatchKafkaProducer] Done with produ +cing records on topic 12772_8287_20170317_IN_0 [download] Produces exactly your desired output.	[reply] [d/l]
Re^2: Log Parsing by piyushmnnit06 (Novice) on Apr 03, 2017 at 06:09 UTC
Thanks it worked fine for this small data set .But I have to do it for complete log ,I mean I have top open a file and the iteratively have to do it .	[reply]
Re^3: Log Parsing by tybalt89 (Monsignor) on Apr 03, 2017 at 07:35 UTC
How big is your log file? If it's really big, that's a crucial piece of information that should be included with the rest of the problem statement.	[reply]
Re^4: Log Parsing by piyushmnnit06 (Novice) on Apr 03, 2017 at 08:04 UTC
Re^5: Log Parsing by tybalt89 (Monsignor) on Apr 03, 2017 at 08:09 UTC
Some notes below your chosen depth have not been shown here
Re: Log Parsing by poj (Abbot) on Apr 02, 2017 at 19:08 UTC
Rather than iterating through an array to find a match like this `foreach $line (<LOGFILE1>) { chomp($line); foreach $childid (@Task_id) {` [download] build a hash and use the keys to match #!/usr/bin/perl use strict; my %data = (); my @id = (); my $reqid = '8274'; my $infile = 'worker.log'; my $outfile = 'data.CSV'; open IN,'<',$infile or die "$!"; # input while (<IN>) { chomp; next unless /Task:id=(\d+)/; my $taskid = $1; my (undef,$timestamp,undef) = split /\s+/,$_,3; if (/(Started\|Done).+topic (\d+_(\d+)_.+)/){ $data{$3}{$1} = $timestamp; $data{$3}{'Topic'} = $2; } while ( /\{(\d+)-SUCCESSFUL\}/g ){ push @id,$1 if ($taskid eq $reqid); } } close IN; # output open OUT,'>',$outfile or die "$!"; my @cols = qw(Topic Started Done); printf OUT "%s,%s,%s\n",@cols; for my $id (sort @id){ if (exists $data{$id}){ printf OUT "%s,%s,%s\n", map { $data{$id}{$_} } @cols; } else { print OUT "$id - no data\n"; } } close OUT; [download] poj	[reply] [d/l] [select]
Re^2: Log Parsing by piyushmnnit06 (Novice) on Apr 03, 2017 at 05:56 UTC
its working fine but only problem is its providing timing for topics where 1 is appended not 0 .I mean its giving timing for 12772_8287_20170317_IN_1 but not for 12772_8287_20170317_IN_0.	[reply]
Re^3: Log Parsing by poj (Abbot) on Apr 03, 2017 at 06:08 UTC
Make this regex more specific `#if (/(Started\|Done).+topic (\d+_(\d+)_.+)/){ if (/(Started\|Done).+topic (\d+_(\d+)_\d+_IN_0)/){` [download] poj	[reply] [d/l]
Re^4: Log Parsing by piyushmnnit06 (Novice) on Apr 03, 2017 at 06:29 UTC
Re^5: Log Parsing by poj (Abbot) on Apr 03, 2017 at 06:58 UTC
Some notes below your chosen depth have not been shown here
Re: Log Parsing by LanX (Saint) on Apr 02, 2017 at 18:55 UTC
If you used `strict` and `warnings` you'd notice some typos in your variables, like task_id for instance. Fix them please. update Hmm though they seem to be different variables. Cheers Rolf _{(addicted to the Perl Programming Language and ☆☆☆☆ :) Je suis Charlie!}	[reply]
Re: Log Parsing by stevieb (Canon) on Apr 02, 2017 at 13:53 UTC
Please edit your question and place your input data into `<code></code>` tags as you've done with your code. Then, please also include some example desired output, also in code tags.	[reply] [d/l]
Re^2: Log Parsing by piyushmnnit06 (Novice) on Apr 02, 2017 at 15:57 UTC
I have made the changes as you suggested.Hopefully it would be more clear now.	[reply]

update