`diff`ing two files (code)

deprecated has asked for the wisdom of the Perl Monks concerning the following question:

I am faced with the "worse is better" way in my current project. I have a piece of code that creates files with several thousand checksums in it.

One of the things it does is open a file thusly:

open (OUT, "|sort -u > $filename");
[download]

whereas I decided to use a hash and ditch the sort (1). This has been mostly successful, and cleaned up other areas of the code. I have a new problem in a similar vein, however:

$compare = `diff $file1 $file2 2>&1`;
$comapre =~ s/^\n$//;
if ($compare ne "") { #fail ... }
[download]

using diff (1) is very tempting, but I would rather not use any system calls in this case. I am, however, faced with adding, say, 10 lines of code instead of just using the diff.

I know that this is why the "worse is better" approach is sometimes taken by programmers, and I may just use it. But I'd really like to know if somebody has a solution that is simple and clear... Or opinions on using diff in this case <!- except I dont want to hear from you, tilly... ->.

thanks
brother dep.

--
Laziness, Impatience, Hubris, and Generosity.

Comment on `diff`ing two files (code) Select or Download Code

Replies are listed 'Best First'.
Re: `diff`ing two files (code) by mdillon (Priest) on Jul 20, 2001 at 20:38 UTC
since your code snippet doesn't show that you need to actually know what the differences are, just that the files aren't the same, what you're looking for is File::Compare (part of the standard distro, i believe). in a similar vein, you could continue to use an external command and switch to `cmp`. `use File::Compare; if (compare $file1, $file2) { #fail }` [download] this was discussed earlier in Comparing two files (and a few other times). if you do need the differences, do a Super Search for Algorithm::Diff (it has been discussed about half a dozen times).	[reply] [d/l]
Re: `diff`ing two files (code) by bikeNomad (Priest) on Jul 20, 2001 at 22:27 UTC
You don't say what you're using the output of diff to do. But I would assume that since you're looking at its stdout rather than merely testing $? that you actually are using the diff output. You may find my Algorithm::Diff module useful for this, though it is somewhat slower than the diff call. However, it may be easier than parsing diff output, depending on what you need to do, since it uses callbacks. Look at traverse_sequences(). On my system, it takes about 0.010 seconds of CPU time to diff two identical 2000-line files using the diff program, and 0.090 seconds of CPU time to diff the same two files using a Perl diff program that uses Algorithm::Diff. However, there is optimization for the common cases of identical lines at the beginning and ends of arrays. Changing the first and last lines results in 0.230 seconds of CPU time. Anyway, you may want to give it a try. update: got my decimal places right.	[reply]
Re: `diff`ing two files (code) by larryk (Friar) on Jul 20, 2001 at 20:40 UTC
You could look in the Perl Power Tools diff implementation for some hints. A quick glance reveals Algorithm::Diff at CPAN. "Argument is futile - you will be ignorralated!"	[reply]
Re (tilly) 1: `diff`ing two files (code) by tilly (Archbishop) on Jul 20, 2001 at 20:42 UTC
If you have small files and the comparison only happens once, why not just read both into strings and test whether they are eq? If you have larger ones you should write a function that takes 2 filenames and compares the file, first checking stat to see that the size is the same, and if they are then using read to process the files in chunks. (For portability you can use binmode.) Update: Or, as an above poster suggested, use File::Compare and not write it yourself. More generally you could use Algorithm::Diff, but it can be slow if you have largish files. Personally I would prefer to rely on diff for file comparisons, and the module only when my needs get more sophisticated...	[reply]
Re: `diff`ing two files (code) by runrig (Abbot) on Jul 20, 2001 at 20:47 UTC
If you do go with a system call, and you don't care about what the differences are, use cmp. `system('cmp','-s',$file1,$file2) and print "Files are different!\n";` [download]	[reply] [d/l]
Re: `diff`ing two files (code) by kevin_i_orourke (Friar) on Jul 20, 2001 at 20:44 UTC
I thought there might be a module to do this, but I couldn't find one. You could try to use Algorithm::Diff but I'm not sure it would be easy. Have you tried Super search to see if anyone else has asked a similar question? Update:Just realised this might sound really patronising, didn't mean it that way, sorry. Kevin O'Rourke	[reply]
Re: `diff`ing two files (code) by Sifmole (Chaplain) on Jul 20, 2001 at 20:59 UTC
Just get rid of the newlines in your Perl code, and you will only have added one-line. :) More seriously, why are you using the measurement of the number of lines you have to add as the measurement of which way is better? The number of lines required to perform a function has little, if any, relation to performance of said function.	[reply]
Re: `diff`ing two files (code) by aquacade (Scribe) on Jul 21, 2001 at 07:15 UTC
Have you considered using the modules Tie-IxHash along with Digest::MD5 to fingerprint whatever your checksums are checking now? This combo may give you more leverage than rolling your own sorted order hashes and checksums? Just a thought! By the way, you are personally invited to the next meeting of the DC Perl Mongers. See http://dc.pm.org for info if you're interested. I've read many of your nodes here in the Monastery and would like to meet you in person someday!	[reply]