In my shop, we use cvs. To compare the differences between two versions, I used to do:

cvs diff -y -r v1 -r v2 >out.txt 2>&1

This works fine with comparing two source files. However when I compare two directories, the output file becomes really big, and even worse, the output fom cvs is a mixtrue of differences and common lines, so the differences are just buried among all those common lines.

I want to extra those real differences.

For those lines really different between two versions, its 63th column is one of those three chars: |, >, or <.

The part caused me a little bit effort is to translate \t into blanks, as I have to calculate the number of blanks. (\t is not always 8 blanks, but rather padding up to the next 8's multiplier)

To use this program, do this:

real_diff.pl out

It would create a file called out.rl, which only contains those lines that are really different between two versions.
real_diff.pl #!/usr/bin/perl use strict; use warnings; my $directory; if (!$ARGV[0]) { die "Usage: real_diff.pl directory_name\n"; } else { $directory = $ARGV[0]; } open(DIFF_FILE, "<", "$directory.txt") || die "There is no diff file f +or directory $directory\n"; my @lines = <DIFF_FILE>; close(DIFF_FILE); print "Now creating real diff file $directory.rl ...\n"; my $prev_line; my $current_file; open(REAL_DIFF, ">", "$directory.rl"); for (0 .. $#lines) { my $line = $lines[$_]; my $out_line = ""; for (my $i = 0; $i < length($line); $i ++) { if (substr($line, $i, 1) eq "\t") { $out_line .= " " x (8 - length($out_line) % 8); } else { $out_line .= substr($line, $i, 1); } } if ($out_line =~ m/^RCS file: (.*?),/) { $current_file = $1; } if ($out_line =~ m/^.{62}[<|>|\|]/) { if (!$prev_line || ($_ > $prev_line + 1)) { print REAL_DIFF "========================================= +=============================\n"; print REAL_DIFF "line number in $directory.txt: " , $_ + 1 +, "\n"; print REAL_DIFF "Source file name: $current_file\n\n"; } print REAL_DIFF $out_line; $prev_line = $_; } } close(REAL_DIFF);

Replies are listed 'Best First'.
Re: extract useful infos from cvs diff output
by jmcnamara (Monsignor) on Jan 31, 2003 at 09:43 UTC

    To compare the differences between two versions, I used to do:

    cvs -y -r v1 -r v2 >out.txt 2>&1

    This is missing a command which, from the context of the question, looks like cvs diff.

    In that case you could use the -t option to expand tabs:

    cvs diff -y -t -r rev1 -r rev2

    You can also use -b option to ignore changes in whitespace and -B to ignore blank lines. For details on the options see:

    cvs diff --help

    In particular you may find the --suppress-common-lines option useful. The following will do most of the hard work of your program:

    cvs diff -y --suppress-common-lines -t -r rev1 -r rev2 # Or just cvs diff -y --sup -t -r rev1 -r rev2

    Also, in your code you do not check to see if your second open succeeds. I wouldn't normally comment on something like that expect that it seems to contradict your statement above:

    1. Exception handling has to be there.

    --
    John.

      This is a very good example where other people's specialties can greatly help. jmcnamara's experience with cvs diff totally nullified my snippet. I don't care my XP point, I want to openly accept the fact that I was so wrong, and didn't realize cvs diff's full functionality.

      THANK YOU VERY MUCH, I FEEL SO GRATEFUL.

      UPDATE

      After tried --suppress-common-lines for a while, I went back to my snippet again.

      The problem with that option is that, it only shows you the differences, with no indication of their positions in the source file, which make me totally lost. I don't just want to know how big the difference is, I want to know exactly where those modifications are.

      One great benefit I got from my snippet is that, it extracts the real differences, and at the same time, it gives me the line numbers of those real differences (not in the source files, but in the result file from cvs diff, but it is still good enough, as the result from cvs diff without --supress-common-lines does contain copies of source files), so I can easily locate them in the source code.

      However I still love the --suppress-common-lines option, and will use it in the future, whenever it fits. Thank you jmcnamara.

Re: extract useful infos from cvs diff output
by Aristotle (Chancellor) on Jan 31, 2003 at 05:54 UTC
    #!/usr/bin/perl -w use strict; use Text::Tabs; # core module die "Usage: real_diff.pl file.txt [file2.txt ..]\n" unless @ARGV; my $prev_line; my $file; while(<>) { $file = $1 if /^RCS file: (.*?),/; if(/^.{62}[<>\|]/) { if(not $prev_line or $prev_line < $. - 1) { print "=" x 70, "\n"; print "Line number in $ARGV.txt: $.\n"; print "Source file name: $file\n\n"; } print expand($_); $prev_line = $.; } }

    Makeshifts last the longest.

      Text::Tabs does tab replacing, but that module uses a global variable $tabstop (which is a style I really hate, to me, any package shows this kind of style will not be considered)

      It might be too much to expect it to be OO, but at least $tabstop should be a parameter to expand, instead of being a global.

      The other thing is that you obviously chopped too much useful stuffs away:
      1. Exception handling has to be there. I don't expect my program to show any compilation warnings/errors or run time warnings/errors. That's simply not my style.

        If there is an exception, my code should catch it and handle it, not Perl.

      2. File handling. Result would be stored in file regardless. I do not expect anyone to pipe them from command line. Those outputs might be stored as document.

        "...but that module uses a global variable $tabstop (which is a style I really hate, to me, any package shows this kind of style will not be considered)"

        Oh please! I suppose this means that you will stop using just about every CPAN module then, since most of them use the global variable $VERSION? If it gets the job done, is stable, is robust, then i won't knock it just because the author might have 'skimmed' a little and used a global or two. Sometimes, once in a blue moon, using globals provides a more elegant solution.

        For example, Data::Dumper uses a global variable to set the indent level, $Indent. Sure, i would rather have a method to set that level, but i am not going to dismiss that module as worthless just because of one little wart ... <update>which Aristotle just showed me how to remove (thanks Aristotle!)</update>

        jeffa

        run this on your top level CPAN module install dir:
        perl -MFile::Find::Rule -e '@ARGV=File::Find::Rule->file("*.pm")->in(" +.");while(<>){print if /^\$VERSION/}' | wc -l

        I'd like to think the stuff I removed in my version added flexibility.

        The global variable in Text::Tabs irritates me too, but we're talking about a 10-liner utility script here. I don't use strictures on oneliners either, you know.

        Your "exception handling" message is misleading. If the open fails due to permission problems, your message still says the script found no file. The diamond operator will produce a good error message: detailing the operation (opening in this case), filename and reason for failure ($!). Note that it will not look like a Perl error - it doesn't contain the "at line [..]" bits.

        I don't see any exception handling on your write-open either. That's an issue I just forgo entirely by printing to STDOUT and letting the user take care of it.

        And finally, if I ever change my mind about my file naming conventions, this script will not have to be touched. To me, that's very important. I try to write my utility scripts with the Unix toolset philosophy in mind and so far, it's paid off.

        Makeshifts last the longest.