I hope a couple of pointers will help you out..

First on the date format. If you can and I'm not sure that you can, but using something like: 20100815200003, i.e. "2010-08-15 20:00:03" would be much preferred on the left hand side of your log file instead of "Sun Aug 15 20:00:03 2010" because a simple alpha-numeric comparison or sort can be done on that type of string without converting to epoch time. The leading zeroes are important otherwise a simple sort won't work. If you want to, add the redundant info like "Sun" as a separate field for the humans to read.

I wasn't exactly able to figure out what you are doing with the data although your data structure might be more complex than necessary. If you want to keep track of where your last processing left off, I would just make a separate file and put that date/time code in it. If you use a time format like above, then you can just simple cmp for less than, equal, greater than. If this extra "bookmark" file isn't there, I would process the whole file and then generate that bookmark file. I would not recommend appending anything to your log file with the "hey, I got here last time info". Whatever the thing is that generates this file, leave it alone and don't mess with its data.

My inclination would be to concentrate the parsing of the input lines into one sub. I did that below. I wouldn't worry about being fancy, just get the job done. I didn't agonize over "the best way"..I just wanted to show a couple of techniques. Improve the code later if you need to. Performance of this sub will not be an issue, only "correctness" of the parsing. Of course when you have a "pair", that screams hash table. Usually there is no need to modify what I called $param in my code.

Anyway, let us know how you are getting on. I commend you for tackling a hard problem as a "first assignment".

#!/usr/bin/perl -w use strict; while (<DATA>) { next if (/^\s*$/); #skip blank lines chomp; my ($date, $backupset , $parm , $value) = parseline($_); # the idea is to concentrate the parsing of the line and its # associated "regex-foo" into one place. I think rest of your # code can use simple eq or ne comparisons. print "$date\n"; print " BACKUP SET = $backupset\n"; if ($value eq "") { print " SINGLE TOKEN: $parm\n";} else {print " PAIR: $parm IS $value\n";} } sub parseline { my $line = shift; my ($date, $rest) = $line =~ m/(^.*\d{4}):(.*)/; my ($backupset, $msg) = split(/backup:INFO:/, $rest); $backupset =~ s/:\s*$//; #trimming some unwanted thing like ':' is + ok $backupset =~ s/^\s*backup\.//; #more than one step is just fine to +o! my ($parm, $value) = $msg =~ m/(.*)=(.*)/; $parm ||= $msg; #if match doesn't happen these will be undef $value ||=""; #so this trick makes sure that they are defined. return ($date, $backupset, $parm, $value); } =prints... Sun Aug 15 20:00:03 2010 BACKUP SET = set2_lvm SINGLE TOKEN: START OF BACKUP Sun Aug 15 20:00:04 2010 BACKUP SET = set2_lvm PAIR: backup-set IS backup.set2_lvm Sun Aug 15 20:00:04 2010 BACKUP SET = set2_lvm PAIR: backup-date IS 20100815200003 Sun Aug 15 20:00:04 2010 BACKUP SET = set2_lvm PAIR: backup-type IS regular Sun Aug 15 20:00:04 2010 BACKUP SET = set2_lvm PAIR: backup-date-epoch IS 1281927603 Sun Aug 15 20:00:04 2010 BACKUP SET = set2_lvm PAIR: backup-directory IS /home/backups/backup.set2_lvm/201008152 +00003 Mon Aug 16 00:00:04 2010 BACKUP SET = set1_lvm SINGLE TOKEN: START OF BACKUP Mon Aug 16 00:00:05 2010 BACKUP SET = set1_lvm PAIR: backup-set IS backup.set1_lvm Mon Aug 16 00:00:05 2010 BACKUP SET = set1_lvm PAIR: backup-date IS 20100816000003 Mon Aug 16 00:00:05 2010 BACKUP SET = set1_lvm PAIR: backup-type IS regular Mon Aug 16 00:00:05 2010 BACKUP SET = set1_lvm PAIR: backup-date-epoch IS 1281942003 Mon Aug 16 00:33:15 2010 BACKUP SET = set2_lvm_lvm PAIR: last-backup IS /home/backups/backup.set2_lvm_lvm/2010081420 +0003 .... and so forth .... =cut
your data as a __DATA__ segment is here:
__DATA__ Sun Aug 15 20:00:03 2010: backup.set2_lvm:backup:INFO: START OF BACKUP Sun Aug 15 20:00:04 2010: backup.set2_lvm:backup:INFO: backup-set=back +up.set2_lvm Sun Aug 15 20:00:04 2010: backup.set2_lvm:backup:INFO: backup-date=201 +00815200003 Sun Aug 15 20:00:04 2010: backup.set2_lvm:backup:INFO: backup-type=reg +ular Sun Aug 15 20:00:04 2010: backup.set2_lvm:backup:INFO: backup-date-epo +ch=1281927603 Sun Aug 15 20:00:04 2010: backup.set2_lvm:backup:INFO: backup-director +y=/home/backups/backup.set2_lvm/20100815200003 Mon Aug 16 00:00:04 2010: backup.set1_lvm:backup:INFO: START OF BACKUP Mon Aug 16 00:00:05 2010: backup.set1_lvm:backup:INFO: backup-set=back +up.set1_lvm Mon Aug 16 00:00:05 2010: backup.set1_lvm:backup:INFO: backup-date=201 +00816000003 Mon Aug 16 00:00:05 2010: backup.set1_lvm:backup:INFO: backup-type=reg +ular Mon Aug 16 00:00:05 2010: backup.set1_lvm:backup:INFO: backup-date-epo +ch=1281942003 Mon Aug 16 00:33:15 2010: backup.set2_lvm_lvm:backup:INFO: last-backup +=/home/backups/backup.set2_lvm_lvm/20100814200003 Mon Aug 16 00:33:15 2010: backup.set2_lvm_lvm:backup:INFO: backup-size +=424.53 GB Mon Aug 16 00:33:15 2010: backup.set2_lvm_lvm:backup:INFO: backup-time +=04:33:12 Mon Aug 16 00:33:15 2010: backup.set2_lvm_lvm:backup:INFO: backup-stat +us=Backup succeeded Mon Aug 16 00:33:15 2010: backup.set2_lvm_lvm:backup:INFO: Backup succ +eeded Mon Aug 16 00:33:16 2010: backup.set2_lvm_lvm:backup:INFO: END OF BACK +UP Mon Aug 16 01:59:07 2010: backup.set1_lvm:backup:INFO: last-backup=/ho +me/backups/backup.set1_lvm/20100815000006 Mon Aug 16 01:59:07 2010: backup.set1_lvm:backup:INFO: backup-size=187 +.24 GB Mon Aug 16 01:59:07 2010: backup.set1_lvm:backup:INFO: backup-time=01: +59:04 Mon Aug 16 01:59:07 2010: backup.set1_lvm:backup:INFO: backup-status=B +ackup succeeded Mon Aug 16 01:59:07 2010: backup.set1_lvm:backup:INFO: Backup succeede +d Mon Aug 16 01:59:09 2010: backup.set1_lvm:backup:INFO: END OF BACKUP

In reply to Re: Parsing logs and bookmarking last line parsed by Marshall
in thread Parsing logs and bookmarking last line parsed by JaeDre619

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.