comment on

I hope a couple of pointers will help you out..

First on the date format. If you can and I'm not sure that you can, but using something like: 20100815200003, i.e. "2010-08-15 20:00:03" would be much preferred on the left hand side of your log file instead of "Sun Aug 15 20:00:03 2010" because a simple alpha-numeric comparison or sort can be done on that type of string without converting to epoch time. The leading zeroes are important otherwise a simple sort won't work. If you want to, add the redundant info like "Sun" as a separate field for the humans to read.

I wasn't exactly able to figure out what you are doing with the data although your data structure might be more complex than necessary. If you want to keep track of where your last processing left off, I would just make a separate file and put that date/time code in it. If you use a time format like above, then you can just simple cmp for less than, equal, greater than. If this extra "bookmark" file isn't there, I would process the whole file and then generate that bookmark file. I would not recommend appending anything to your log file with the "hey, I got here last time info". Whatever the thing is that generates this file, leave it alone and don't mess with its data.

My inclination would be to concentrate the parsing of the input lines into one sub. I did that below. I wouldn't worry about being fancy, just get the job done. I didn't agonize over "the best way"..I just wanted to show a couple of techniques. Improve the code later if you need to. Performance of this sub will not be an issue, only "correctness" of the parsing. Of course when you have a "pair", that screams hash table. Usually there is no need to modify what I called $param in my code.

Anyway, let us know how you are getting on. I commend you for tackling a hard problem as a "first assignment".

#!/usr/bin/perl -w
use strict;

while (<DATA>)
{
   next if (/^\s*$/);    #skip blank lines
   chomp;
   
   my ($date, $backupset , $parm , $value) = parseline($_);
   
   # the idea is to concentrate the parsing of the line and its
   # associated "regex-foo" into one place. I think rest of your
   # code can use simple eq or ne comparisons.
   
   print "$date\n";
   print "   BACKUP SET = $backupset\n";
   if ($value eq "") 
      { print "   SINGLE TOKEN: $parm\n";}
   else
      {print "   PAIR:  $parm IS $value\n";}

}

sub parseline
{
   my $line = shift;
   my ($date, $rest) = $line =~ m/(^.*\d{4}):(.*)/;
   my ($backupset,  $msg) = split(/backup:INFO:/, $rest);
   $backupset =~ s/:\s*$//;  #trimming some unwanted thing like ':' is
+ ok
   $backupset =~ s/^\s*backup\.//; #more than one step is just fine to
+o!
   my ($parm, $value) = $msg =~ m/(.*)=(.*)/;
   $parm  ||= $msg;   #if match doesn't happen these will be undef
   $value ||="";      #so this trick makes sure that they are defined.

   return ($date, $backupset, $parm, $value);
}   

=prints...
Sun Aug 15 20:00:03 2010
   BACKUP SET = set2_lvm
   SINGLE TOKEN:  START OF BACKUP
Sun Aug 15 20:00:04 2010
   BACKUP SET = set2_lvm
   PAIR:   backup-set IS backup.set2_lvm
Sun Aug 15 20:00:04 2010
   BACKUP SET = set2_lvm
   PAIR:   backup-date IS 20100815200003
Sun Aug 15 20:00:04 2010
   BACKUP SET = set2_lvm
   PAIR:   backup-type IS regular
Sun Aug 15 20:00:04 2010
   BACKUP SET = set2_lvm
   PAIR:   backup-date-epoch IS 1281927603
Sun Aug 15 20:00:04 2010
   BACKUP SET = set2_lvm
   PAIR:   backup-directory IS /home/backups/backup.set2_lvm/201008152
+00003
Mon Aug 16 00:00:04 2010
   BACKUP SET = set1_lvm
   SINGLE TOKEN:  START OF BACKUP
Mon Aug 16 00:00:05 2010
   BACKUP SET = set1_lvm
   PAIR:   backup-set IS backup.set1_lvm
Mon Aug 16 00:00:05 2010
   BACKUP SET = set1_lvm
   PAIR:   backup-date IS 20100816000003
Mon Aug 16 00:00:05 2010
   BACKUP SET = set1_lvm
   PAIR:   backup-type IS regular
Mon Aug 16 00:00:05 2010
   BACKUP SET = set1_lvm
   PAIR:   backup-date-epoch IS 1281942003
Mon Aug 16 00:33:15 2010
   BACKUP SET = set2_lvm_lvm
   PAIR:   last-backup IS /home/backups/backup.set2_lvm_lvm/2010081420
+0003
.... and so forth ....   
=cut
[download]

your data as a __DATA__ segment is here:

__DATA__
Sun Aug 15 20:00:03 2010: backup.set2_lvm:backup:INFO: START OF BACKUP
Sun Aug 15 20:00:04 2010: backup.set2_lvm:backup:INFO: backup-set=back
+up.set2_lvm
Sun Aug 15 20:00:04 2010: backup.set2_lvm:backup:INFO: backup-date=201
+00815200003
Sun Aug 15 20:00:04 2010: backup.set2_lvm:backup:INFO: backup-type=reg
+ular
Sun Aug 15 20:00:04 2010: backup.set2_lvm:backup:INFO: backup-date-epo
+ch=1281927603
Sun Aug 15 20:00:04 2010: backup.set2_lvm:backup:INFO: backup-director
+y=/home/backups/backup.set2_lvm/20100815200003
Mon Aug 16 00:00:04 2010: backup.set1_lvm:backup:INFO: START OF BACKUP
Mon Aug 16 00:00:05 2010: backup.set1_lvm:backup:INFO: backup-set=back
+up.set1_lvm
Mon Aug 16 00:00:05 2010: backup.set1_lvm:backup:INFO: backup-date=201
+00816000003
Mon Aug 16 00:00:05 2010: backup.set1_lvm:backup:INFO: backup-type=reg
+ular
Mon Aug 16 00:00:05 2010: backup.set1_lvm:backup:INFO: backup-date-epo
+ch=1281942003
Mon Aug 16 00:33:15 2010: backup.set2_lvm_lvm:backup:INFO: last-backup
+=/home/backups/backup.set2_lvm_lvm/20100814200003
Mon Aug 16 00:33:15 2010: backup.set2_lvm_lvm:backup:INFO: backup-size
+=424.53 GB
Mon Aug 16 00:33:15 2010: backup.set2_lvm_lvm:backup:INFO: backup-time
+=04:33:12
Mon Aug 16 00:33:15 2010: backup.set2_lvm_lvm:backup:INFO: backup-stat
+us=Backup succeeded
Mon Aug 16 00:33:15 2010: backup.set2_lvm_lvm:backup:INFO: Backup succ
+eeded
Mon Aug 16 00:33:16 2010: backup.set2_lvm_lvm:backup:INFO: END OF BACK
+UP
Mon Aug 16 01:59:07 2010: backup.set1_lvm:backup:INFO: last-backup=/ho
+me/backups/backup.set1_lvm/20100815000006
Mon Aug 16 01:59:07 2010: backup.set1_lvm:backup:INFO: backup-size=187
+.24 GB
Mon Aug 16 01:59:07 2010: backup.set1_lvm:backup:INFO: backup-time=01:
+59:04
Mon Aug 16 01:59:07 2010: backup.set1_lvm:backup:INFO: backup-status=B
+ackup succeeded
Mon Aug 16 01:59:07 2010: backup.set1_lvm:backup:INFO: Backup succeede
+d
Mon Aug 16 01:59:09 2010: backup.set1_lvm:backup:INFO: END OF BACKUP
[download]

In reply to Re: Parsing logs and bookmarking last line parsed by Marshall
in thread Parsing logs and bookmarking last line parsed by JaeDre619

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.