Unfortunately, Text::Diff is non-core (which rules it out of the particular environments I am trying to compare files across) and warns of a nasty memory leak problem when calling it many times using Perl v5.6.1 or earlier (just my bad luck that all of those conditions manage to apply). I also have to find a solution that takes a matter of minutes of my time or otherwise the time spent by the people who tried and failed to do the job with unix diff and korn shell cannot be recovered and there will be a lot of hassle from the bean counters.
So it occurred to me that it really ought to be just a few lines of code to hand-roll this. My immediate thought was to write something like:
In a blunt sort of way this works, but I didn't even see fit to test this because it seems to miss the opportunity of handling insertions to the right (no problem with insertions to the left hand file). Instead, as soon as the right hand file has an insertion, the rest of each file would get dumped out in turn as difference sets.sub Diff { my @a; my $dif = undef(); for my $fdx (0..1) { open my $fh, $_[$fdx] or die "$!: $_[$fdx]\n"; my @tmp = <$fh>; $a[0] = \@tmp; close $fh; } while ( $a[0] -> [0] and $a[1] -> [0] ) { if ( $a[0] -> [0] eq $a[1] -> [0] ) { shift @$a[0]; shift @$a[1]; } elsif ( $a[0] -> [0] ) { $dif .= '< ' . shift( @$a[0] ); } else { $dif .= '> ' . shift( @$a[1] ); } } return $dif; }
It looks like I need to search ahead for a match on the right hand side before the point where the left hand side gets treated as different. But this could cause bad performance for large and different files, being proportional in iteration count to the square of the line count. Does anyone have a brighter idea for a difference algorithm that improves on such performance metrics? Or can perhaps this one be modified simply enough?
Thanks in advance for any suggestions.
Update: This is what it looks like with that RH lookahead, which by the way makes it function the same as Algorithm::Diff
sub Diff { my @a; my $dif = undef(); my $found; for my $fdx (0..1) { open my $fh, $_[$fdx] or die "$!: $_[$fdx]\n"; my @tmp = <$fh>; $a[0] = \@tmp; close $fh; } while ( $a[0] -> [0] and $a[1] -> [0] ) { if ( $a[0] -> [0] eq $a[1] -> [0] ) { shift @$a[0]; shift @$a[1]; } elsif ( $found = Search( @a )) { $found--; for my $idx ( 0..$found ) { $dif .= '> ' . shift( @$a[1] ); } } elsif ( $a[0] -> [0] and not Search( @a ) { $dif .= '< ' . shift( @$a[0] ); } else { $dif .= '> ' . shift( @$a[1] ); } } return $dif; } sub Search { my $found = 0; my $max = $#$_[1]; while ( $_[0] -> [0] ne $_[1] -> [$found] ) { $found++; ( $found >= $max ) and return 0; } return $found; }
-M
Free your mind
In reply to seeking diff algorithm by Moron
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |