raj8 has asked for the wisdom of the Perl Monks concerning the following question:

Hi Monks,

I have been working on home project to check the status of syncing my documents. The syncing process is called a job and and for each job I want to check for the status of three things for on the current day: 1.) Synchronizing started 2.) Synchronizing finished 3.) Summary with Files Processed, Files copied and Bytes processed. Eventually, I would like to take each part such as Files processed, Files copied and Bytes processed and put it in a MySQL database so that I can just run queries against it to check on past Successes and make some stats. That said, I am having problems with getting all three of the items together for each individual job. I appreciate any assistance, suggestions, ideas. Thanks again.

my $today = get_cur_time(); ### Defining Jobs @jobs = ("Allway Job"); ### Defining Print Header print "\n======================================\n"; print "\n Homebrew Reporting Program\n\n"; print " Today: $today\n"; print "\n======================================\n\n"; ### Opening Allway Sync Log File open(FH1,"C:\\Documents and Settings\\localadmin\\Application Data\\Sy +nc App Settings\\_SYNCAPP\\default.log") || die "Cannot open file:$!\n"; ### keep Log open while reading while((<FH1>)) { ### $date =~ s/\[//; foreach $job (@jobs) { if( $date eq $today && /Synchronizing started/ || /Synchronizing finis +hed/ || /Summary/ ) { #$syncStartD = substr($_,1,9); #$syncStartT = substr($_,11,7); #$syncStartjobName = substr($_,20,120); #$filesProcessed = substr($_,21,44); print $_; # "$syncStartD$syncStartT$syncStartjobName$filesProcessed\ +n"; }; }; }; ### Subroutines for current date sub get_cur_time { my ($Day, $Month, $Year) = (localtime(time))[3..6]; $Year += 1900; $Month = sprintf '%02d', $Month + 1; $Day = sprintf '%02d', $Day; return "$Month/$Day/$Year"; }; __DATA__ [3/21/2008 12:48 PM] Synchronizing started, job: "Allway Job" [3/21/2008 12:48 PM] Analyzing finished, job: "Allway Job" [3/21/2008 12:48 PM] Deleting: "C:\test\_SYNCAPP\temp" ... [3/21/2008 12:48 PM] Deleting: "c:\test\_SYNCAPP\temp" ... [3/21/2008 12:48 PM] Preparing metadata ... [3/21/2008 12:48 PM] Flushing drive "\\.\C:" buffers ... [3/21/2008 12:48 PM] Writing File: C:\test\_SYNCAPP\metadata.xml" ... [3/21/2008 12:48 PM] Writing File: "c:\test\_SYNCAPP\metadata.xml" ... [3/21/2008 12:48 PM] Flushing drive "\\.\C:" buffers ... [3/21/2008 12:48 PM] Flushing drive "\\.\c:" buffers ... [3/21/2008 12:48 PM] Synchronizing finished, job: "Allway Job" [3/21/2008 12:48 PM] Summary: Files processed: 18; Files copied: 0; By +tes processed: 180,372; Bytes copied: 0.

Replies are listed 'Best First'.
Re: Parsing Unstructured Data
by NetWallah (Canon) on Mar 22, 2008 at 20:05 UTC
    Fixed your code a little bit:

    - use strict; !!!!!
    - Improved Regex checking
    - Dates changed to arrays, to make comparison easier
    - Cluttered, nested if's replaced by "next if" or "next unless"
    - Test data you provided was used (no external files)
    - Please remove the $today[1]-- : That was to compensate for old data

    Cheers!

    use strict; my @today = get_cur_time(); $today[1]--; # Compensate for old data ### Defining Jobs my @jobs = ("Allway Job"); ### Defining Print Header print "\n======================================\n"; print "\n Homebrew Reporting Program\n\n"; print " Today: @today\n"; print "\n======================================\n\n"; ### Opening Allway Sync Log File #open(FH1,"C:\\Documents and Settings\\localadmin\\Application Data\\S +ync App Settings\\_SYNCAPP\\default.log") # || die "Cannot open file:$!\n"; ### keep Log open while reading GETDATA:while(<DATA>){ #(<FH1>)) { next unless /Synchronizing started|Synchronizing finished|Summary/; my @date= m|(\d+)/(\d+)/(\d+)|; for my $i(0..$#today){ next GETDATA if $date[$i]!=$today[$i]; } ### foreach my $job (@jobs) { my $syncStartD = substr($_,1,9); my $syncStartT = substr($_,11,7); my $syncStartjobName = substr($_,20,120); my $filesProcessed = substr($_,21,44); print $_, "\t$syncStartD$syncStartT$syncStartjobName$filesProcesse +d\n"; }; }; ### Subroutines for current date sub get_cur_time { my ($Day, $Month, $Year) = (localtime(time))[3..6]; $Year += 1900; #--Note - the "sprintf"s are no longer necessary ... $Month = sprintf '%02d', $Month + 1; $Day = sprintf '%02d', $Day; return ($Month,$Day,$Year); }; __DATA__ [3/21/2008 12:48 PM] Synchronizing started, job: "Allway Job" [3/21/2008 12:48 PM] Analyzing finished, job: "Allway Job" [3/21/2008 12:48 PM] Deleting: "C:\test\_SYNCAPP\temp" ... [3/21/2008 12:48 PM] Deleting: "c:\test\_SYNCAPP\temp" ... [3/21/2008 12:48 PM] Preparing metadata ... [3/21/2008 12:48 PM] Flushing drive "\\.\C:" buffers ... [3/21/2008 12:48 PM] Writing File: C:\test\_SYNCAPP\metadata.xml" ... [3/21/2008 12:48 PM] Writing File: "c:\test\_SYNCAPP\metadata.xml" ... [3/21/2008 12:48 PM] Flushing drive "\\.\C:" buffers ... [3/21/2008 12:48 PM] Flushing drive "\\.\c:" buffers ... [3/21/2008 12:48 PM] Synchronizing finished, job: "Allway Job" [3/21/2008 12:48 PM] Summary: Files processed: 18; Files copied: 0; By +tes processed: 180,372; Bytes copied: 0.

         "As you get older three things happen. The first is your memory goes, and I can't remember the other two... " - Sir Norman Wisdom

      Thank you for your help. The 'date' and for loop was the approach that I was looking for, but couldn't seem to get it. Your approach taught me a great deal.
Re: Parsing Unstructured Data
by apl (Monsignor) on Mar 22, 2008 at 19:42 UTC
    Standard caveat: your code should start with "use strict; use warnings;".

    If you do that, you'll see that you need to explicitly define @jobs, $date, $job, @jobs and $date.

    If you want to test with a DATA block, you need to change from <FH1> to <DATA>.

    Then you'll get uninitialized values for your substitution defining, and later using, $date.

    I took the liberty of changing

    my $date =~ s/\[//;

    to

    my $date =''; if ( $_ =~ /\[(\S+)\s/ ) { $date = $1; }

    This will get you to the point where you'll be able to start debugging your code. Keep in mind that get_cur_time returns a value of the form MM/DD/YY, while $date will contain a value of M/DD/YY. Fixing this is left as an exercise to the reader.

    Please realize that print statements are your friend. So is the Perl debugger.

    Good luck with your project.