in reply to Parsing logs and bookmarking last line parsed

I hope a couple of pointers will help you out..

First on the date format. If you can and I'm not sure that you can, but using something like: 20100815200003, i.e. "2010-08-15 20:00:03" would be much preferred on the left hand side of your log file instead of "Sun Aug 15 20:00:03 2010" because a simple alpha-numeric comparison or sort can be done on that type of string without converting to epoch time. The leading zeroes are important otherwise a simple sort won't work. If you want to, add the redundant info like "Sun" as a separate field for the humans to read.

I wasn't exactly able to figure out what you are doing with the data although your data structure might be more complex than necessary. If you want to keep track of where your last processing left off, I would just make a separate file and put that date/time code in it. If you use a time format like above, then you can just simple cmp for less than, equal, greater than. If this extra "bookmark" file isn't there, I would process the whole file and then generate that bookmark file. I would not recommend appending anything to your log file with the "hey, I got here last time info". Whatever the thing is that generates this file, leave it alone and don't mess with its data.

My inclination would be to concentrate the parsing of the input lines into one sub. I did that below. I wouldn't worry about being fancy, just get the job done. I didn't agonize over "the best way"..I just wanted to show a couple of techniques. Improve the code later if you need to. Performance of this sub will not be an issue, only "correctness" of the parsing. Of course when you have a "pair", that screams hash table. Usually there is no need to modify what I called $param in my code.

Anyway, let us know how you are getting on. I commend you for tackling a hard problem as a "first assignment".

#!/usr/bin/perl -w use strict; while (<DATA>) { next if (/^\s*$/); #skip blank lines chomp; my ($date, $backupset , $parm , $value) = parseline($_); # the idea is to concentrate the parsing of the line and its # associated "regex-foo" into one place. I think rest of your # code can use simple eq or ne comparisons. print "$date\n"; print " BACKUP SET = $backupset\n"; if ($value eq "") { print " SINGLE TOKEN: $parm\n";} else {print " PAIR: $parm IS $value\n";} } sub parseline { my $line = shift; my ($date, $rest) = $line =~ m/(^.*\d{4}):(.*)/; my ($backupset, $msg) = split(/backup:INFO:/, $rest); $backupset =~ s/:\s*$//; #trimming some unwanted thing like ':' is + ok $backupset =~ s/^\s*backup\.//; #more than one step is just fine to +o! my ($parm, $value) = $msg =~ m/(.*)=(.*)/; $parm ||= $msg; #if match doesn't happen these will be undef $value ||=""; #so this trick makes sure that they are defined. return ($date, $backupset, $parm, $value); } =prints... Sun Aug 15 20:00:03 2010 BACKUP SET = set2_lvm SINGLE TOKEN: START OF BACKUP Sun Aug 15 20:00:04 2010 BACKUP SET = set2_lvm PAIR: backup-set IS backup.set2_lvm Sun Aug 15 20:00:04 2010 BACKUP SET = set2_lvm PAIR: backup-date IS 20100815200003 Sun Aug 15 20:00:04 2010 BACKUP SET = set2_lvm PAIR: backup-type IS regular Sun Aug 15 20:00:04 2010 BACKUP SET = set2_lvm PAIR: backup-date-epoch IS 1281927603 Sun Aug 15 20:00:04 2010 BACKUP SET = set2_lvm PAIR: backup-directory IS /home/backups/backup.set2_lvm/201008152 +00003 Mon Aug 16 00:00:04 2010 BACKUP SET = set1_lvm SINGLE TOKEN: START OF BACKUP Mon Aug 16 00:00:05 2010 BACKUP SET = set1_lvm PAIR: backup-set IS backup.set1_lvm Mon Aug 16 00:00:05 2010 BACKUP SET = set1_lvm PAIR: backup-date IS 20100816000003 Mon Aug 16 00:00:05 2010 BACKUP SET = set1_lvm PAIR: backup-type IS regular Mon Aug 16 00:00:05 2010 BACKUP SET = set1_lvm PAIR: backup-date-epoch IS 1281942003 Mon Aug 16 00:33:15 2010 BACKUP SET = set2_lvm_lvm PAIR: last-backup IS /home/backups/backup.set2_lvm_lvm/2010081420 +0003 .... and so forth .... =cut
your data as a __DATA__ segment is here:
__DATA__ Sun Aug 15 20:00:03 2010: backup.set2_lvm:backup:INFO: START OF BACKUP Sun Aug 15 20:00:04 2010: backup.set2_lvm:backup:INFO: backup-set=back +up.set2_lvm Sun Aug 15 20:00:04 2010: backup.set2_lvm:backup:INFO: backup-date=201 +00815200003 Sun Aug 15 20:00:04 2010: backup.set2_lvm:backup:INFO: backup-type=reg +ular Sun Aug 15 20:00:04 2010: backup.set2_lvm:backup:INFO: backup-date-epo +ch=1281927603 Sun Aug 15 20:00:04 2010: backup.set2_lvm:backup:INFO: backup-director +y=/home/backups/backup.set2_lvm/20100815200003 Mon Aug 16 00:00:04 2010: backup.set1_lvm:backup:INFO: START OF BACKUP Mon Aug 16 00:00:05 2010: backup.set1_lvm:backup:INFO: backup-set=back +up.set1_lvm Mon Aug 16 00:00:05 2010: backup.set1_lvm:backup:INFO: backup-date=201 +00816000003 Mon Aug 16 00:00:05 2010: backup.set1_lvm:backup:INFO: backup-type=reg +ular Mon Aug 16 00:00:05 2010: backup.set1_lvm:backup:INFO: backup-date-epo +ch=1281942003 Mon Aug 16 00:33:15 2010: backup.set2_lvm_lvm:backup:INFO: last-backup +=/home/backups/backup.set2_lvm_lvm/20100814200003 Mon Aug 16 00:33:15 2010: backup.set2_lvm_lvm:backup:INFO: backup-size +=424.53 GB Mon Aug 16 00:33:15 2010: backup.set2_lvm_lvm:backup:INFO: backup-time +=04:33:12 Mon Aug 16 00:33:15 2010: backup.set2_lvm_lvm:backup:INFO: backup-stat +us=Backup succeeded Mon Aug 16 00:33:15 2010: backup.set2_lvm_lvm:backup:INFO: Backup succ +eeded Mon Aug 16 00:33:16 2010: backup.set2_lvm_lvm:backup:INFO: END OF BACK +UP Mon Aug 16 01:59:07 2010: backup.set1_lvm:backup:INFO: last-backup=/ho +me/backups/backup.set1_lvm/20100815000006 Mon Aug 16 01:59:07 2010: backup.set1_lvm:backup:INFO: backup-size=187 +.24 GB Mon Aug 16 01:59:07 2010: backup.set1_lvm:backup:INFO: backup-time=01: +59:04 Mon Aug 16 01:59:07 2010: backup.set1_lvm:backup:INFO: backup-status=B +ackup succeeded Mon Aug 16 01:59:07 2010: backup.set1_lvm:backup:INFO: Backup succeede +d Mon Aug 16 01:59:09 2010: backup.set1_lvm:backup:INFO: END OF BACKUP

Replies are listed 'Best First'.
Re^2: Parsing logs and bookmarking last line parsed
by JaeDre619 (Acolyte) on Aug 19, 2010 at 20:08 UTC

    @Marshall- thank you very much for your insight. What turned out initially as a way to summarize this data turned into a bigger project than I anticipated, but felt it was a good assignment for me to learn perl.

    You definitely nailed what I needed which was keying off the backup set and extracting some attributes associated with it.

    My goal of this output was to produce a delimited file as you can see in my print statements. Having to prefix the attributes with a numbering system seemed to help me with sorting it. The file is needed as input to an html table. Anyways, i'll review your pointers and code. Thanks again.

      Wow! You've taken on a pretty difficult "first assignment"! And you've gotten a heck of a lot further than most could have done! There are some "quirks" about this that make some of the details difficult.

      I posted some more code for you. Take a look and see "what is missing/not right".

      Update: I see why you did: my $BckupKey="5-Duration"; Don't do this "5-" "decoration" of the hash key. There are better albeit advanced techniques for specifying the sort order. Concentrate on getting what data you need and then you can get help here about how to get it appear in the "right" order.

      Below is just one example of a special sort order. A more robust thing would take into account what happens when I haven't specified the order of some input string vs another. I am just saying that advanced sorting is one of the things that Perl is very good at.

      #!/usr/bin/perl -w use strict; my @special_order = ("x", "b", "a", "y"); my $i =0; my %sort_order = map{$_ => $i++}@special_order; my @array = ("a", "x", "y", "b"); @array = sort @array; print "Regular Sort: @array\n"; @array = ("a", "x", "y", "b"); @array = sort by_order @array; print "Special Sort: @array\n"; sub by_order { my $a_order = $sort_order{$a}; my $b_order = $sort_order{$b}; $a_order <=> $b_order } __END__ prints: Regular Sort: a b x y Special Sort: x b a y
        @Marshall - thanks for all your help. I just noticed the code you posted earlier from a reply. Seeing the results using dumper was neat. That's exactly what I needed was the breakdown of value/pair data for each set. I am looking now at the code you provided for sorting and see how I can fit that in with your other example. I'd like to get this working and use it as a reference for me of good code and seeing another way of writing it helps me learn :-) My end result would be a delimited file. I'll try it out and see if i can string it together. thanks!

        This is the data string I was constructing with my script. I know its screwy with the hashing and all...

        1-Server=server1.domain.com;2-Logdate=Thu Aug 19 2010;3-BackupSet=back +up.set1_lvm;4-StartTime=06:00:03;5-Duration=00:56:53;6-Size=72.04 GB; +7-Status=Succeeded; 1-Server=server1.domain.com;2-Logdate=Thu Aug 19 2010;3-BackupSet=back +up.set2_lvm;4-StartTime=00:00:04;5-Duration=01:56:35;6-Size=187.24 GB +;7-Status=Succeeded; 1-Server=server1.domain.com;2-Logdate=Thu Aug 19 2010;3-BackupSet=back +up.set3_lvm;4-StartTime=23:00:05;8-Status=Unsuccessful;
        @Marshall - I'm looking at your parse script you provided. Can you show me in the regex, how to get only specific attributes? In other words, I'm not interested in splitting all of the data, but interested in these keys for example (backup-set, backup-date, backup-time, ERROR) with the flexibility to add or remove more if needed.

        Also can you show me how to how to print the keys out in order with a delimited format for example (name=value;name=value;) Thank you again.