anoopsaxena76 has asked for the wisdom of the Perl Monks concerning the following question:

Hi, I hope this is the right forum for my question. I think I start my journey into the world of Perl with this problem. There is an existing code base and my team has made some enhancements to it, spreading across files in different directories, all under one directory (/usr/src). The code added, if it is multiple lines, is between comment tags (LM-2006-01), as shown below.
/* LM-2006-01 */ code modification spawning more than 1 line. /* LM-2006-01 */
If the code enhancement is only one line, the above comment tag succeeds the code in the same line, as shown below.
int my_var; /* LM-2006-01 */
I want to count the total number of lines that I have modified across the files. Also I do not want to count the comment lines, C style comments, embedded inside the code modifications and I do not want to count lines if it has only "{" or "}". What is the best tool for it? Perl, sed, awk? If it is Perl, can someone just give me hints as to what keywords should I look for in Perl tutorials to get started. If any other tool is well suited, I would really appreciate it if I can be redirected to a proper forum or if I can be given some hints. I don't have the option of using Python. Thanks a lot. Anoop.

Replies are listed 'Best First'.
Re: Best tool for my requirements?
by hesco (Deacon) on May 25, 2006 at 07:34 UTC
    anoopsaxena76:

    It sounds to me like you want to iterate through an array of file names (foreach my $file (@files){), opening each with while(<>){ loops, parsing them for your comments, counting your contributions and moving on to the next. You'll want to open() and close() file handles for reading. You'll want to study a bit of regex for determining which lines you want to count and which you don't. And you'll need to write simple arithmatic operations to handle pointers and counts. The challenging thing here is that your markers for starting and ending your modifications are the same. But this sounds like a generally easy script to write.

    All that begs the question, though? Have you been coding without a net? Couldn't something like this work for you:

    cvs diff -rv.vvv -rv.vvv $filename
    or, perhaps:
    cvs log $filename
    If not consider committing everything you've got to a revision control repository before you go any further. Some here likely use Subversion. I still use the old faithful cvs. The time coding this simple script might be better spent installing a repository on your network and reading the Cederqvist Manual, or the RedBean book or even practicing the commands from this useful crib sheet I've kept handy for the past couple of years.

    The cvs log command gives useful reports that look like this for one of today's commits:

    revision 1.9 date: 2006-05-25 06:57:55 +0000; author: hesco; state: Exp; lines: ++14 -6 added $port to interface of db connection routines. made another attempt to dynamically build connection parameters. commented it all out and left the hard-wired version running.
    I think that is the information you are looking for: lines: +14 -6. I added fourteen lines and deleted six with that commit, compared to what was in the repository. Committing my work to cvs as I go permits me to then obtain useful reports such as the one you seek, from a bash command line, like so:
    cvs log Registration/lib/Registration/WWW/RegForm.pm | grep lines: | g +rep 2006-05-24
    yielding a quick summary of the day's work:
    date: 2006-05-24 19:03:46 +0000; author: hesco; state: Exp; lines: ++4 -2 date: 2006-05-24 18:44:42 +0000; author: hesco; state: Exp; lines: ++41 -6 date: 2006-05-24 07:05:09 +0000; author: hesco; state: Exp; lines: ++33 -11
    -- Hugh

    if( $lal && $lol ) { $life++; }
Re: Best tool for my requirements?
by Polonius (Friar) on May 25, 2006 at 07:25 UTC

    Anoop,

    Welcome to the monastery! If you want an unbiased comparison between Perl, sed and awk, you've come to the wrong place! Folks around here are pretty passionate about Perl. But whether Perl is the best tool for the job, it can certainly do it, and do it quite easily.

    You'll find pretty much all you need in the book known as The Llama. Alternatively, try the tutorials here in the monastery. You need a little code to search the directory tree for relevant files, then a regular expression to identify your delimiting comments, and the rest is trivial.

    Polonius
Re: Best tool for my requirements?
by TedPride (Priest) on May 25, 2006 at 08:22 UTC
    A very basic example showing that Perl can theoretically do this quite easily. I don't know what other comment styles you want removed, however, so I imagine this will need some tweaking.
    $_ = join '', <DATA>; # Remove comments that don't start the line s/^.+\/\*.*?\*\//-/mg; # Remove comments spanning multiple lines s/\/\*(.*?)\*\/.*?\/\*\1\*\///sg; # Remove ignored characters s/[{} ]+//g; map { $c++ if $_; } split /\n/, $_; print "$c lines of code."; __DATA__ /* LM-2006-01 */ multi-line comment /* LM-2006-01 */ for (i = 0; i < 10; i++) { int my var; /* LM-2006-01 */ } /* LM-2006-01 */ another multi-line comment /* LM-2006-01 */
Re: Best tool for my requirements?
by pajout (Curate) on May 25, 2006 at 12:34 UTC
    egrep -r ... | wc
    update: Sorry, I have not been concentrated, forgot it.