Unique Data Formatting

raj8 has asked for the wisdom of the Perl Monks concerning the following question:

Hi Monks,

I have come to a rock solid stop on this. I am attempting to gather data on files that I am syncing on my pc which I call jobs. For each job, I want to gather:

1.) Unique Date and Time for Analyzing

2.) Unique Date and Time for Sync to Start

3.) Unique Date and Time for Synchronizing Finished

4.) Unique Date and Time for Summary of each job.

As shown, the Summary line does not contain the job name. Therefore, I want to prefix the Summary data line with the job name. The problem I am having is that when I run the script it sometimes it prefixes an incorrect 'job' name on lines that do not belong to that particular job. I would hope that once I get this corrected, I could enter this data in a small mysql db for better reporting later. Anyway, thanks for your time and any suggestions that will help me overcome this battle.


use strict;

my @today = get_cur_time();

### Compensates for old data if needed ###
#$today[1]--; 

### Defining Jobs ###

my @jobs = ("Job 3");

### Defining Print Header  ###

print "\n======================================\n";
print "\n         Today:  @today\n";
print "\n======================================\n\n";

### Opening Log File ###

open(FH1,"c:\\default.log") 
        || die "Cannot open file:$!\n";


### keep Log open while reading ###

GETDATA:while(<FH1>){                     

### If not find..Carry on ###

next unless  /Synchronizing started|Synchronizing finished|Summary/;

my @date= m|(\d+)/(\d+)/(\d+)|;

for my $i(0..$#today) {
  next GETDATA if $date[$i]!=$today[$i];
}   

 foreach my $job (@jobs) { 
   my $syncStartD = substr($_,1,9); 
   my $syncStartT = substr($_,11,7);
   my $syncStartjobName = substr($_,20,120); 
   my $filesProcessed = substr($_,21,44); 

print $_, "\n$job: ";                   



    };
};


### Subroutines ###

sub get_cur_time  {
    my ($Day, $Month, $Year) = (localtime(time))[3..6];
    
    $Year   += 1900;
    #--Note - the "sprintf"s are no longer necessary ...
    $Month  = sprintf '%02d', $Month + 1;
    $Day    = sprintf '%02d', $Day;
    return   ($Month,$Day,$Year);
};

__DATA__

[4/14/2008 4:44 PM] Analyzing started, job: "New Job 1"
[4/14/2008 4:45 PM] Synchronizing started, job: "New Job 1"
[4/14/2008 4:45 PM] Analyzing finished, job: "New Job 1"
[4/14/2008 4:45 PM] Synchronizing finished, job: "New Job 1"
[4/14/2008 4:45 PM] Summary: Files processed: 6,554; Files copied: 1; 
+Bytes processed: 1,633,599,552; Bytes copied: 1,221,120.
[4/14/2008 5:03 PM] Analyzing started, job: "New Job 2"
[4/14/2008 5:06 PM] Analyzing finished, job: "New Job 2"
[4/14/2008 5:06 PM] Synchronizing finished, job: "New Job 2"
[4/14/2008 5:06 PM] Summary: Files processed: 136; Files copied: 58; B
+ytes processed: 14,427,075; Bytes copied: 14,427,075.
[4/14/2008 5:08 PM] Analyzing started, job: "New Job 3"
[4/14/2008 5:08 PM] Analyzing finished, job: "New Job 3"
[4/14/2008 5:08 PM] Synchronizing started, job: "New Job 3"
[4/14/2008 5:08 PM] Synchronizing finished, job: "New Job 3"
[4/14/2008 5:08 PM] Summary: Files processed: 38; Files copied: 17; By
+tes processed: 292,097; Bytes copied: 294,793.
[download]

Comment on Unique Data Formatting Download Code

Replies are listed 'Best First'.
Re: Unique Data Formatting by pc88mxer (Vicar) on Apr 18, 2008 at 06:37 UTC
First some style/structure comments: `next unless /Synchronizing started\|Synchronizing finished\|Summary/; my @date= m\|(\d+)/(\d+)/(\d+)\|;` [download] I would make sure that you check that the date regex matched before using `@date`. Something like: `my @date; if (@date = m{(\d+)/(\d+)/(\d+)}) { ... }` [download] The danger is that if the date regex fails, you will have no way of knowing about it. (Update: I guess this is not entirely true, but it is still good coding practice to always test if regex matches succeed or not.) The next issue is: never mind... I didn't see how `@today` was defined Here's what I think you want: `my %times; my $job; while (<FH1>) { next unless (s{^\[(\d+/\d+/\d+ \d+:\d+:\d+)\]\s}{}); # malformed li +ne my $timestamp = $1; if (m/^(Synchonization\|Analyzing) (started\|finished), job: "(.?)"/) + { $job = $3; $times{$job}->{$1}->{$2} = $timestamp; } elsif (m/^Summary:/) { $times{$job}->{summary} = $_; } } print Dumper(\%times); use Data::Dumper;` [download] One key element of this loop is to save the last parsed job so we know which job a "Summary" line refers to. The list of jobs is `keys %times`. There are other ways of storing the timestamp data, and you may want to choose another data structure depending on how you plan to use the data later.	[reply] [d/l] [select]
Re: Unique Data Formatting by wfsp (Abbot) on Apr 18, 2008 at 06:42 UTC
...I want to prefix the Summary data line with the job name. I think part of the problem is `print $_, "\n$job: ";` [download] where `$job` is printed on the line following the current line. Better to remember the previous lines job and print it if needed. The following might help get you started. #!/usr/local/bin/perl use strict; use warnings; use Data::Dumper; $Data::Dumper::Indent = 1; my ($job); while(<DATA>){ chomp; next unless /Synchronizing started\|Synchronizing finished\|Summary/; my $summary; $summary++ if /Summary/; my $this_job; if (($this_job) = $_ =~ /"New Job (\d+)"$/){ $job = $this_job; } print qq{job: $job } if $summary; print qq{$_\n}; }; __DATA__ [4/14/2008 4:44 PM] Analyzing started, job: "New Job 1" [4/14/2008 4:45 PM] Synchronizing started, job: "New Job 1" [4/14/2008 4:45 PM] Analyzing finished, job: "New Job 1" [4/14/2008 4:45 PM] Synchronizing finished, job: "New Job 1" [4/14/2008 4:45 PM] Summary: Files processed: 6,554; Files copied: 1; +Bytes processed: 1,633,599,552; Bytes copied: 1,221,120. [4/14/2008 5:03 PM] Analyzing started, job: "New Job 2" [4/14/2008 5:06 PM] Analyzing finished, job: "New Job 2" [4/14/2008 5:06 PM] Synchronizing finished, job: "New Job 2" [4/14/2008 5:06 PM] Summary: Files processed: 136; Files copied: 58; B +ytes processed: 14,427,075; Bytes copied: 14,427,075. [4/14/2008 5:08 PM] Analyzing started, job: "New Job 3" [4/14/2008 5:08 PM] Analyzing finished, job: "New Job 3" [4/14/2008 5:08 PM] Synchronizing started, job: "New Job 3" [4/14/2008 5:08 PM] Synchronizing finished, job: "New Job 3" [4/14/2008 5:08 PM] Summary: Files processed: 38; Files copied: 17; By +tes processed: 292,097; Bytes copied: 294,793. [download] outputs [4/14/2008 4:45 PM] Synchronizing started, job: "New Job 1" [4/14/2008 4:45 PM] Synchronizing finished, job: "New Job 1" job: 1 [4/14/2008 4:45 PM] Summary: Files processed: 6,554; Files copi +ed: 1; Bytes processed: 1,633,599,552; Bytes copied: 1,221,120. [4/14/2008 5:06 PM] Synchronizing finished, job: "New Job 2" job: 2 [4/14/2008 5:06 PM] Summary: Files processed: 136; Files copied +: 58; Bytes processed: 14,427,075; Bytes copied: 14,427,075. [4/14/2008 5:08 PM] Synchronizing started, job: "New Job 3" [4/14/2008 5:08 PM] Synchronizing finished, job: "New Job 3" job: 3 [4/14/2008 5:08 PM] Summary: Files processed: 38; Files copied: + 17; Bytes processed: 292,097; Bytes copied: 294,793. [download] Not sure what you are trying to do with the rest of the code.	[reply] [d/l] [select]
Re^2: Unique Data Formatting by raj8 (Sexton) on Apr 18, 2008 at 16:43 UTC
Thanks, but I still get an incorrect job associated with the Summary data. When a job Finishes, I need that line and I need to associate that with the Summary line. For the most part it works, but in some instances I still get a wrong job ID associated with the Summary line as shown below with Job 3 and Job 1. `[4/17/2008 11:03 PM] Synchronizing started, job: "Job 1" [4/17/2008 11:03 PM] Synchronizing finished, job: "Job 1" job: 3 [4/17/2008 11:03 PM] Summary: Files processed: 6,566; Files cop +ied: 0; Bytes processed: 1,633,707,954; Bytes copied: 0.` [download]	[reply] [d/l]
Re^2: Unique Data Formatting by raj8 (Sexton) on Apr 18, 2008 at 18:52 UTC
Thanks for your help, but just an update. It must have been a bad log file that I might have cut data out of it causing the response. However, the code section below doesn't account for any Job Name such as a Job Named 'My files' or 'My files 001'. I would want to capture the job name that is after the colon that could be any character, sets of characters or numbers such as: `[4/14/2008 4:45 PM] Synchronizing started, job: "My Files 01" [4/14/2008 4:45 PM] Synchronizing finished, job: "My Files 01" job: My Files 01 [4/14/2008 4:45 PM] Summary: Files processed: 6,554; +Files copied [4/14/2008 4:45 PM] Synchronizing started, job: "New Job 1" [4/14/2008 4:45 PM] Synchronizing finished, job: "New Job 1"` [download]	[reply] [d/l]
Re^3: Unique Data Formatting by wfsp (Abbot) on Apr 19, 2008 at 06:08 UTC
...capture the job name that is after the colon that could be any character, sets of characters or numbers... But is the job name between the quotes? Include the quotes? Always at the end of the string? Is there a space after the colon, is that part of the name? Have a gander at the tutorials: perlrequick and perlretut. The perlre doc has the full low down. Adjust the following to suite. `my ($job) = $str =~ / job : # "...the colon..." \s " # opening quote ( # start capture [^"]+ # one or more of anything that isn't a quote ) # end of capture " # closing quote $ # at the end of the string /x;` [download]	[reply] [d/l]