vroom has asked for the wisdom of the Perl Monks concerning the following question:

I'm using Algorithm::Diff to keep track of proposed changes to documents. However if I have two proposed changes to the same document I want to be able to tell if the two changes overlap at all. If they don't I want to be able to merge the output of the results of the two different diffs into one set of changes.

As some sample code I have:

#!/usr/bin/perl use Algorithm::Diff qw(diff); use Data::Dumper; my $t1="This is the 1st lin\n This is the second line"; my $t2="This is the 1st line\n This is the second line`"; my $t3="This is the 1st lin\n This is the 2nd line`"; my @t1=split(//,$t1); my @t2=split(//,$t2); my @t3=split(//,$t3); @diff=diff(\@t1,\@t2); @diff2=diff(\@t1,\@t3); print Dumper(@diff); print "\n--------\n"; print Dumper(@diff2);
I get the following printed out for the differences between $t1 and $t2
$VAR1 = [ [ '+', 19, 'e' ] ]; $VAR2 = [ [ '+', 45, '`' ] ];
Here is the computed difference between $t1 and $t3
$VAR1 = [ [ '-', 33, 's' ], [ '-', 34, 'e' ], [ '-', 35, 'c' ], [ '-', 36, 'o' ], [ '+', 33, 2 ] ]; $VAR2 = [ [ '+', 41, '`' ] ];
How can I merge the two differences into one that can be applied to the original while verifying that the changes aren't conflicting. </CODE>

vroom | Tim Vroom | vroom@cs.hope.edu

Replies are listed 'Best First'.
Re: Question about Algorithm::Diff
by Dominus (Parson) on Nov 26, 2000 at 21:57 UTC
    It's not totally clear to me what you mean by 'conflicts', and I think that there are probably several plausible meanings. For example, suppose change A says to add 100 lines of text at certain position, and change B does not. It that a conflict? Probably not, but perhaps, depending on what's in your mind. Similarly, suppose change A says to replace the text at lines 10-20 with a certain paragraph and change B says to add the same paragraph at line 21 without removing lines 10-20 first. Is that a conflict? I'm not sure.

    Anyway, with the disclaimer, here is a subroutine that produces a list of conflicts between two diffs:

    sub conflicts { my ($a, $b) = @_; my ($ai, $bi) = (0,0); my @conflicts; while (defined $a->[$ai] && defined $b->[$bi]) { my ($ac, $bc) = ($a->[$ai], $b->[$bi]); # high and low line numbers for chunks a and b my ($al, $ah) = ($ac->[0][1], $ac->[-1][1]); my ($bl, $bh) = ($bc->[0][1], $bc->[-1][1]); if ($ah < $bl) { # chunk a precedes chunk b ++$ai; } elsif ($al > $bh) { # chunk a follows chunk b ++$bi; } else { # chunks overlap # compare the two chunks for conflicts my %h; for my $line (@$ac) { my ($s, $ln, $t) = @$line; $h{$ln}{$s} = $t; } for my $line (@$bc) { my ($s, $ln, $t) = @$line; next unless exists $h{$ln}; if (exists $h{$ln}{$s} && $h{$ln}{$s} ne $t) { push @conflicts, [[$s, $ln, $h{$ln}{$s}], [$s, $ln, $t]]; } } ++$ai; ++$bi; } } return @conflicts; }
    In list context, it returns a list of the conflicting sections. In scalar context, it returns the number of conflicts, which you can also use as a boolean.

    This function interprets two insertions of different text at the same place to be a conflict, or two deletions of different text at the same place to be a conflict. If one diff deletes text at a position and the other inserts new text at the same place, that's not a conflict.

    If that isn't what you want, the place to change it is probably the inner for my $line (@$bc) loop. At this point, the subroutine has located two overlapping sets of changes. (Non-overlapping sets of changes obviously can't be in conflict.) Change set A has been stored in the hash %h and the loop is looping over the changes in change set B to see if any of them conflict with the changes in A. By altering the tests inside this loop, you'll change the notion of 'conflict'.

    Hope this helps.

Re: Question about Algorithm::Diff
by merlyn (Sage) on Nov 26, 2000 at 07:47 UTC