comment on

I am enamored with webalizer. Actually, I like it so much, I began writing an analyzer for CVS
that will do the same thing i.e. parse through the logs, generate some graphs, and put it in some HTML files.

To parse through the CVS logs, I use two regex's. What I want is to be able to use one multi-line regex. Unfortunately,
the /m modifier escapes me.

My code follows:

The command cvs log filename has output that looks as follows:


RCS file: /usr/local/cvs/scripts/myFile.pl,v
Working file: myFile.pl
head: 1.4
branch:
locks: strict
access list:
symbolic names:
keyword substitution: kv
total revisions: 4;    selected revisions: 4
description:
----------------------------
revision 1.4
date: 2001/08/02 21:28:39;  author: alw9;  state: Exp;  lines: +1 -1
some comment here
----------------------------
revision 1.3
date: 2001/07/30 04:15:00;  author: alw9;  state: Exp;  lines: +2 -0
another comment
----------------------------
revision 1.2
date: 2001/07/25 18:01:05;  author: alw9;  state: Exp;  lines: +1 -1
cvs comments go here
----------------------------
revision 1.1
date: 2001/07/25 17:38:17;  author: jms18;  state: Exp;
put this in the cvs repository
and comment the cvs
======================================================================
+=======
[download]

To parse this and grab say the date, author, and revision, I use two single line regexes as follows:

#!/usr/bin/perl -w
use strict;
use Cwd;

my($ver, $date, $coder);

my $currentDir = cwd(); # remember starting dir

# change into temp working area
chdir "/tmp" or die "Could not change to /tmp because: $!\n";

# grab the cvs project from the repository
`cvs co scripts`;

#hop into the temp dir
chdir "./scripts" or die "Could not change to /tmp because: $!\n";

# run 'cvs log' against file and grab the output
my @output = `cvs log myFile.pl`;

for(@output)
{
    next if 1..11;    # skip first eleven lines of output

    $ver = $1 if(/^revision\s(\d+\.\d+)$/);
    
    $date = $1, $coder = $2 
            if(m!^date:\s(\d{4}/\d{2}/\d{2})\s[\d|:]+;\s+author:\s+(\w
++);!);
    
    # if we have everything we were looking for
    # print it out and undef the variables
    if( defined($ver) && defined($date) && defined($coder) )
    {
        print "ver = $ver\ndate = $date\ncoder = $coder\n\n";
        undef $ver;
        undef $date;
        undef $coder;
        next;
    }
    
} # end while

# change back to tmp dir and delete the project
chdir "/tmp" or die "Could not change to /tmp because: $!\n";
`rm -rf ./scripts`;

# return the user to his scheduled dir
chdir $currentDir or die "Could not change to $currentDir because: $!\
+n";
[download]

But, this felt bad to me. Don't get me wrong... it works just fine, but it felt ugly. I wanted one regex -- that felt cleaner.
So, I grabbed the /m modifier and went to work.

$ver = $1, $date = $2, $coder = $3
if(m!\Arevision\s(\d+\.\d+)$
    ^date:\s(\d{4}/\d{2}/\d{2})\s[\d|:]+;\s+author:\s+(\w+);
    [\w|\s|:|;|\d|\+|-]+\Z!mx);
[download]

But that gives me nothing. Obviously, I am using multi-line regex's completely wrong. I believe my problem lies in
greediness. Regex's are greedy and multi-line regex's must be really greedy. I don't know if I am not matching at all, or if
I am just slurping everything up or whatever.

I have the owl book in my left hand and the camel book in my right hand (which, makes it hard to type), and I am trying to get this working. Anyone care to give me some pointers?

Jeremy

In reply to Multi-line Parsing of CVS log by enoch

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.