Moron has asked for the wisdom of the Perl Monks concerning the following question:

On the O/S versions at my current disposal, the unix and linux diff have the same bug in them which causes false differences to appear in the output.

Unfortunately, Text::Diff is non-core (which rules it out of the particular environments I am trying to compare files across) and warns of a nasty memory leak problem when calling it many times using Perl v5.6.1 or earlier (just my bad luck that all of those conditions manage to apply). I also have to find a solution that takes a matter of minutes of my time or otherwise the time spent by the people who tried and failed to do the job with unix diff and korn shell cannot be recovered and there will be a lot of hassle from the bean counters.

So it occurred to me that it really ought to be just a few lines of code to hand-roll this. My immediate thought was to write something like:

sub Diff { my @a; my $dif = undef(); for my $fdx (0..1) { open my $fh, $_[$fdx] or die "$!: $_[$fdx]\n"; my @tmp = <$fh>; $a[0] = \@tmp; close $fh; } while ( $a[0] -> [0] and $a[1] -> [0] ) { if ( $a[0] -> [0] eq $a[1] -> [0] ) { shift @$a[0]; shift @$a[1]; } elsif ( $a[0] -> [0] ) { $dif .= '< ' . shift( @$a[0] ); } else { $dif .= '> ' . shift( @$a[1] ); } } return $dif; }
In a blunt sort of way this works, but I didn't even see fit to test this because it seems to miss the opportunity of handling insertions to the right (no problem with insertions to the left hand file). Instead, as soon as the right hand file has an insertion, the rest of each file would get dumped out in turn as difference sets.

It looks like I need to search ahead for a match on the right hand side before the point where the left hand side gets treated as different. But this could cause bad performance for large and different files, being proportional in iteration count to the square of the line count. Does anyone have a brighter idea for a difference algorithm that improves on such performance metrics? Or can perhaps this one be modified simply enough?

Thanks in advance for any suggestions.

Update: This is what it looks like with that RH lookahead, which by the way makes it function the same as Algorithm::Diff

sub Diff { my @a; my $dif = undef(); my $found; for my $fdx (0..1) { open my $fh, $_[$fdx] or die "$!: $_[$fdx]\n"; my @tmp = <$fh>; $a[0] = \@tmp; close $fh; } while ( $a[0] -> [0] and $a[1] -> [0] ) { if ( $a[0] -> [0] eq $a[1] -> [0] ) { shift @$a[0]; shift @$a[1]; } elsif ( $found = Search( @a )) { $found--; for my $idx ( 0..$found ) { $dif .= '> ' . shift( @$a[1] ); } } elsif ( $a[0] -> [0] and not Search( @a ) { $dif .= '< ' . shift( @$a[0] ); } else { $dif .= '> ' . shift( @$a[1] ); } } return $dif; } sub Search { my $found = 0; my $max = $#$_[1]; while ( $_[0] -> [0] ne $_[1] -> [$found] ) { $found++; ( $found >= $max ) and return 0; } return $found; }

-M

Free your mind

Replies are listed 'Best First'.
Re: seeking diff algorithm
by Corion (Patriarch) on Dec 14, 2005 at 13:26 UTC
      It's needed for an investigation into differences between different environments of the same system, e.g. devt. versus QA. The time to approve such porting would exceed acceptable limits for completion of the work, given that this work can be achieved without any porting and that it turns out two lines added to the OP code would make it algorithmically identical to Algorithm::Diff. I'll update the OP with that for clarity. I note also that Text::Diff uses the same algorithm.

      I could find no reference to Algorithm::Diff3 on CPAN.

      -M

      Free your mind

Re: seeking diff algorithm
by rinceWind (Monsignor) on Dec 14, 2005 at 13:49 UTC

    Moron, can you please post a minimal test case, which illustrates the false differences.

    Also, I'm interested in whether Algorithm::Diff suffers from the same problem, as I have quite a bit of code depending on this module. No doubt the author of this module would also be interested if it is failing.

    In terms of a cross-platform pure perl diff, I include a script for this in my module VCS::Lite, called vldiff.

    --

    Oh Lord, won’t you burn me a Knoppix CD ?
    My friends all rate Windows, I must disagree.
    Your powers of persuasion will set them all free,
    So oh Lord, won’t you burn me a Knoppix CD ?
    (Missquoting Janis Joplin)

    A reply falls below the community's threshold of quality. You may see it by logging in.
A reply falls below the community's threshold of quality. You may see it by logging in.