Multi-line Parsing of CVS log

enoch has asked for the wisdom of the Perl Monks concerning the following question:

I am enamored with webalizer. Actually, I like it so much, I began writing an analyzer for CVS
that will do the same thing i.e. parse through the logs, generate some graphs, and put it in some HTML files.

To parse through the CVS logs, I use two regex's. What I want is to be able to use one multi-line regex. Unfortunately,
the /m modifier escapes me.

My code follows:

The command cvs log filename has output that looks as follows:


RCS file: /usr/local/cvs/scripts/myFile.pl,v
Working file: myFile.pl
head: 1.4
branch:
locks: strict
access list:
symbolic names:
keyword substitution: kv
total revisions: 4;    selected revisions: 4
description:
----------------------------
revision 1.4
date: 2001/08/02 21:28:39;  author: alw9;  state: Exp;  lines: +1 -1
some comment here
----------------------------
revision 1.3
date: 2001/07/30 04:15:00;  author: alw9;  state: Exp;  lines: +2 -0
another comment
----------------------------
revision 1.2
date: 2001/07/25 18:01:05;  author: alw9;  state: Exp;  lines: +1 -1
cvs comments go here
----------------------------
revision 1.1
date: 2001/07/25 17:38:17;  author: jms18;  state: Exp;
put this in the cvs repository
and comment the cvs
======================================================================
+=======
[download]

To parse this and grab say the date, author, and revision, I use two single line regexes as follows:

#!/usr/bin/perl -w
use strict;
use Cwd;

my($ver, $date, $coder);

my $currentDir = cwd(); # remember starting dir

# change into temp working area
chdir "/tmp" or die "Could not change to /tmp because: $!\n";

# grab the cvs project from the repository
`cvs co scripts`;

#hop into the temp dir
chdir "./scripts" or die "Could not change to /tmp because: $!\n";

# run 'cvs log' against file and grab the output
my @output = `cvs log myFile.pl`;

for(@output)
{
    next if 1..11;    # skip first eleven lines of output

    $ver = $1 if(/^revision\s(\d+\.\d+)$/);
    
    $date = $1, $coder = $2 
            if(m!^date:\s(\d{4}/\d{2}/\d{2})\s[\d|:]+;\s+author:\s+(\w
++);!);
    
    # if we have everything we were looking for
    # print it out and undef the variables
    if( defined($ver) && defined($date) && defined($coder) )
    {
        print "ver = $ver\ndate = $date\ncoder = $coder\n\n";
        undef $ver;
        undef $date;
        undef $coder;
        next;
    }
    
} # end while

# change back to tmp dir and delete the project
chdir "/tmp" or die "Could not change to /tmp because: $!\n";
`rm -rf ./scripts`;

# return the user to his scheduled dir
chdir $currentDir or die "Could not change to $currentDir because: $!\
+n";
[download]

But, this felt bad to me. Don't get me wrong... it works just fine, but it felt ugly. I wanted one regex -- that felt cleaner.
So, I grabbed the /m modifier and went to work.

$ver = $1, $date = $2, $coder = $3
if(m!\Arevision\s(\d+\.\d+)$
    ^date:\s(\d{4}/\d{2}/\d{2})\s[\d|:]+;\s+author:\s+(\w+);
    [\w|\s|:|;|\d|\+|-]+\Z!mx);
[download]

But that gives me nothing. Obviously, I am using multi-line regex's completely wrong. I believe my problem lies in
greediness. Regex's are greedy and multi-line regex's must be really greedy. I don't know if I am not matching at all, or if
I am just slurping everything up or whatever.

I have the owl book in my left hand and the camel book in my right hand (which, makes it hard to type), and I am trying to get this working. Anyone care to give me some pointers?

Jeremy

Comment on Multi-line Parsing of CVS log Select or Download Code

Replies are listed 'Best First'.
Re: Multi-line Parsing of CVS log by blakem (Monsignor) on Sep 08, 2001 at 03:40 UTC
There was also cvs log parsing (REGEX) just a couple days ago. It might have a hint or two in it... -Blake	[reply]
Re: Multi-line Parsing of CVS log by MadraghRua (Vicar) on Sep 08, 2001 at 02:45 UTC
Enoch I wrote in about something like this a while back - try this node. I never did work the whole way through the problem, but it does contain some useful information on multiple line regexes. Good Luck! MadraghRua yet another biologist hacking perl....	[reply]