piyushmnnit06 has asked for the wisdom of the Perl Monks concerning the following question:

Below is my log structure
0317 09:53:14.865+0000 {12772} INFO [pm-worker-exec slot-Task:id=8274 +,env=12772,type=11][c.s.w.t.f.s.PostExecutionStage ] Loaded {child r +unId vs completion type}: {8286-SUCCESSFUL}{8287-SUCCESSFUL}{8288-SUC +CESSFUL}{8289-SUCCESSFUL}{8290-SUCCESSFUL}{8291-SUCCESSFUL}{8292-SUCC +ESSFUL}{8293-SUCCESSFUL}{8294-SUCCESSFUL}{8295-SUCCESSFUL}{8296-SUCCE +SSFUL} 0317 09:54:12.498+0000 {12772} INFO [pm-worker-exec slot-Task:id=8273 +,env=12772,type=55][edProcessInputBatchKafkaProducer] Started produci +ng records on topic 12772_8286_20170317_IN_0 0317 09:54:13.428+0000 {12772} INFO [pm-worker-exec slot-Task:id=8273 +,env=12772,type=55][edProcessInputBatchKafkaProducer] Started produci +ng records on topic 12772_8287_20170317_IN_0 0317 09:55:13.027+0000 {12772} INFO [pm-worker-exec slot-Task:id=8273 +,env=12772,type=55][edProcessInputBatchKafkaProducer] Done with produ +cing records on topic 12772_8286_20170317_IN_0 0317 09:55:15.027+0000 {12772} INFO [pm-worker-exec slot-Task:id=8273 +,env=12772,type=55][edProcessInputBatchKafkaProducer] Done with produ +cing records on topic 12772_8287_20170317_IN_0

So in above log structure first I need to fetch all successful child runID(i.e 8286,8287....8296) for parent id 8274.Now for each runID i have to get started producing records time stamp and done with producing records time stamp

I tried below logic but thats messed up.Could you please help me regarding this logic

desired output is below

topic,start_time,Endtime 12772_8286_20170317_IN_0,09:54:12.498+0000,09:55:13.027+0000 12772_8287_20170317_IN_0,09:54:13.428+0000,09:55:15.027+0000

Below is my code

open (MYFILE, '>data.CSV'); $LOGFILE = "worker.log"; open(LOGFILE) or die("Could not open log file."); $LOGFILE1 = "worker.log"; open(LOGFILE1) or die("Could not open log file."); $taskid=8274; foreach $line (<LOGFILE>) { chomp($line); if ( $line =~ m/$taskid.*child runId vs completion type/) { @Task_id = $line =~ /\{(\d+)-/g; } } foreach $line (<LOGFILE1>) { chomp($line); foreach $childid(@Task_id) { if ( $line =~ m/Started producing records on topic (\d+)_($childid +)_(\d+)_IN_(\d+)/) { $envId=$1; $timestamp=$3; $input=$4; my @fields = split / /, $line; $data1="$envId\_$childid\_$timestamp\_IN\_$input\,$fields[1]"; $data2="$envId\_$childid\_$timestamp\_IN\_$input"; push @data9,$data1 ; print "$data1\n"; } if ( $line =~ m/Done with producing records on topic (\d+)_($chil +did)_(\d+)_IN_(\d+)/) { $envId=$1; $timestamp=$3; $input=$4; my @fields = split / /, $line; $data3="$envId\_$childid\_$timestamp\_IN\_$input\,$fields[1]"; $data4="$envId\_$childid\_$timestamp\_IN\_$input"; push @data9,$data3 ; print "$data3\n"; } } } foreach (@data9) { print MYFILE "$_\n"; } close LOGFILE; close LOGFILE1;

Replies are listed 'Best First'.
Re: Log Parsing
by tybalt89 (Monsignor) on Apr 02, 2017 at 20:17 UTC

    It just takes a simple regex :)
    ( ducking )

    #!/usr/bin/perl -l # http://perlmonks.org/?node_id=1186713 use strict; use warnings; $_ = do { local $/; <DATA> }; print "topic,start_time,Endtime"; print join ',', $3, $2, $4 while /\b(\d+)-SUCCESSFUL (?= (?:.*\n)* \d+\ (\S+).*Started.*\b(\d+_\1_\d+_IN_0) (?:.*\n)* \d+\ (\S+).*Done.*\b\3 )/gx; __DATA__ 0317 09:53:14.865+0000 {12772} INFO [pm-worker-exec slot-Task:id=8274 +,env=12772,type=11][c.s.w.t.f.s.PostExecutionStage ] Loaded {child r +unId vs completion type}: {8286-SUCCESSFUL}{8287-SUCCESSFUL}{8288-SUC +CESSFUL}{8289-SUCCESSFUL}{8290-SUCCESSFUL}{8291-SUCCESSFUL}{8292-SUCC +ESSFUL}{8293-SUCCESSFUL}{8294-SUCCESSFUL}{8295-SUCCESSFUL}{8296-SUCCE +SSFUL} 0317 09:54:12.498+0000 {12772} INFO [pm-worker-exec slot-Task:id=8273 +,env=12772,type=55][edProcessInputBatchKafkaProducer] Started produci +ng records on topic 12772_8286_20170317_IN_0 0317 09:54:13.428+0000 {12772} INFO [pm-worker-exec slot-Task:id=8273 +,env=12772,type=55][edProcessInputBatchKafkaProducer] Started produci +ng records on topic 12772_8287_20170317_IN_0 0317 09:55:13.027+0000 {12772} INFO [pm-worker-exec slot-Task:id=8273 +,env=12772,type=55][edProcessInputBatchKafkaProducer] Done with produ +cing records on topic 12772_8286_20170317_IN_0 0317 09:55:15.027+0000 {12772} INFO [pm-worker-exec slot-Task:id=8273 +,env=12772,type=55][edProcessInputBatchKafkaProducer] Done with produ +cing records on topic 12772_8287_20170317_IN_0

    Produces exactly your desired output.

      Thanks it worked fine for this small data set .But I have to do it for complete log ,I mean I have top open a file and the iteratively have to do it .

        How big is your log file? If it's really big, that's a crucial piece of information that should be included with the rest of the problem statement.

Re: Log Parsing
by poj (Abbot) on Apr 02, 2017 at 19:08 UTC

    Rather than iterating through an array to find a match like this

    foreach $line (<LOGFILE1>) { chomp($line); foreach $childid (@Task_id) {

    build a hash and use the keys to match

    #!/usr/bin/perl use strict; my %data = (); my @id = (); my $reqid = '8274'; my $infile = 'worker.log'; my $outfile = 'data.CSV'; open IN,'<',$infile or die "$!"; # input while (<IN>) { chomp; next unless /Task:id=(\d+)/; my $taskid = $1; my (undef,$timestamp,undef) = split /\s+/,$_,3; if (/(Started|Done).+topic (\d+_(\d+)_.+)/){ $data{$3}{$1} = $timestamp; $data{$3}{'Topic'} = $2; } while ( /\{(\d+)-SUCCESSFUL\}/g ){ push @id,$1 if ($taskid eq $reqid); } } close IN; # output open OUT,'>',$outfile or die "$!"; my @cols = qw(Topic Started Done); printf OUT "%s,%s,%s\n",@cols; for my $id (sort @id){ if (exists $data{$id}){ printf OUT "%s,%s,%s\n", map { $data{$id}{$_} } @cols; } else { print OUT "$id - no data\n"; } } close OUT;
    poj
      its working fine but only problem is its providing timing for topics where 1 is appended not 0 .I mean its giving timing for 12772_8287_20170317_IN_1 but not for 12772_8287_20170317_IN_0.

        Make this regex more specific

        #if (/(Started|Done).+topic (\d+_(\d+)_.+)/){ if (/(Started|Done).+topic (\d+_(\d+)_\d+_IN_0)/){
        poj
Re: Log Parsing
by LanX (Saint) on Apr 02, 2017 at 18:55 UTC
    If you used strict and warnings you'd notice some typos in your variables, like task_id for instance.

    Fix them please.

    update

    Hmm though they seem to be different variables.

    Cheers Rolf
    (addicted to the Perl Programming Language and ☆☆☆☆ :)
    Je suis Charlie!

Re: Log Parsing
by stevieb (Canon) on Apr 02, 2017 at 13:53 UTC

    Please edit your question and place your input data into <code></code> tags as you've done with your code. Then, please also include some example desired output, also in code tags.

      I have made the changes as you suggested.Hopefully it would be more clear now.