pavanvidem has asked for the wisdom of the Perl Monks concerning the following question:

I have two tab separated files as following
File 1:
35.152000 -1853.494656 0.000000 13.211222
15.978182 0.000000 0.001175 0.000012

File 2:
35.152001 -1853.494657 0.000000 13.211222
15.978182 0.000000 0.001175 0.000011

In the above two files some numbers differ because of change in last digit after decimal. What would be a good routine ignoring differences of +/-1 in the last digit after decimal, so that the result of comparison (of File 1 and File2) shall be equal.

Replies are listed 'Best First'.
Re: A small question on file comparison
by BrowserUk (Patriarch) on Mar 14, 2010 at 10:07 UTC

    You should probably avoid treating the values from your files as numbers.

    It is possible, even likely, that the conversion of these values into Perl's internal numeric format would change their values. For example, they may have been truncated (or rounded using one of several different rounding algorithms), from single precision floats. If you then re-interpret them into double precision floats, you will introduce differences that are not there in the original files, or discard differences that are there.

    This treats the fields as strings (as human eyes do) until the final decision about the last digits, where they are compared as (integer) numerics. It also takes every opportunity to bail out as early as a difference is found.

    #! perl -slw use strict; die "Files differ in length" unless -s( $ARGV[0] ) == -s( $ARGV[ 0 ] ); open FH1, '<', $ARGV[0] or die $!; open FH2, '<', $ARGV[1] or die $!; #my $mismatch = 0; until( eof( FH1 ) || eof( FH2 ) ) { my $line1 = <FH1>; my $line2 = <FH2>; next if $line1 eq $line2; my @line1 = split ' ', $line1; my @line2 = split ' ', $line2; for ( 0 .. $#line1 ) { next if $line1[ $_ ] eq $line2[ $_ ]; next if abs( chop( $line1[ $_ ] ) - chop( $line2[ $_ ] ) ) < 2 and $line1[ $_ ] eq $line2[ $_ ]; die "Files differ at line: $. field: $_\n"; #$mismatch = 1; } } #die "File are different\n" if $mismatch; die "Files have different numbers of lines\n" unless eof( FH1 ) and eof( FH2 ); print "Files are the same\n"; ### Or "files are sufficiently similar" close FH1; close FH2;

    Change the second die to warn and uncomment the related mismatch paraphernalia if you want a comprehensive list of the differences.

    Output for test files:

    C:\test>828506 file1 file2 Files are the same

    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
Re: A small question on file comparison
by toolic (Bishop) on Mar 14, 2010 at 02:04 UTC
    If you know that you want a specific precision, such as 5 decimal places, try something like this:
    use strict; use warnings; my $n1 = 35.152000; my $n2 = 35.152001; if (abs($n1 - $n2) < 0.000_009) { print 'equal' } else { print 'not equal' } __END__ equal

      Here is a version (admittedy much more compicated and there is probably a more eficient approach) that doesn;t require that you know beforehand how many significant figures are expected in the comparison.

      #!/usr/bin/perl use strict; use warnings; my $n1 = 35.152000; my $n2 = 35.152001; my $tol = 1; if(doTest($n1,$n2,$tol)){ print "Strings are the same\n"; } else { print "Strings are not the same\n"; } exit(0); sub doTest { my($n1,$n2,$tol) = @_; $n1 =~ /(\d*)\.?(\d*)/; my ($intPart1,$fracPart1) = ($1,$2); $n2 =~ /(\d*)\.?(\d*)/; my ($intPart2,$fracPart2) = ($1,$2); my $numSigFigs1; my $numSigFigs2; if($fracPart1 ne ''){ $numSigFigs1 = -length($fracPart1); if($fracPart2 ne ''){ $numSigFigs2 = -length($fracPart2); } else { $numSigFigs2 = 0; } } else { $numSigFigs1 = 0; if($fracPart2 ne ''){ $numSigFigs2 = -length($fracPart2); } else { $numSigFigs2 = 0; } } # ensure that the highest precision number controls whose'last digi +t' we # are comparing my $sigFigs = $numSigFigs1; $sigFigs = $numSigFigs2 if($numSigFigs2 < $numSigFigs1); my $test = $tol * 10**($sigFigs); # use the sprintf() function to be sure that there aren;t any stray + diits # way out beyond the number o significant figures you're interested + in # that interfere with the test using abs($n1-$n2) <= $test. if(sprintf("%.*f",abs($sigFigs),abs($n1 - $n2)) <= $test){ return 1; # strings are the same within tolrance $tol } else { return 0; # strings are not the same within tolerance $tol } } # end sub doTest()

      I have tested this with a variety of vakues ranging from integers to a variety of floating points (with same number of significant figures and some with mixed numbers of significant figures) and it appears to work on all such cases. It does not, however work with hex or octal numbers. Also note that the variable $tol is used to set how far off the two 'last digits' can be and sti be onsidered to 'match' (e.g., the OP set this vaue to 1 ... that is +/- 1).</pp>

      I hope this helps.

      ack Albuquerque, NM
      Thank you very much for reply. Even i had the same idea, but could you please tell me how can i compare two such complete files using your idea with less time complexity?
        less time complexity
        I do not understand what you mean by that. At this point, I think you should show what Perl code you have tried, what output you expect, and your actual output. Please show several examples.
Re: A small question on file comparison
by bluestar (Novice) on Mar 16, 2010 at 12:21 UTC

    In this case, equality is determined by two values being approximately equal, with a difference no greater than 0.000001 between the two values.

    0.000001 is a threshold.

    Since you may be dealing with numbers of arbitrary precision it's best to use Math::Decimal for performing arithmetic.
    use strict; use warnings; use File::Slurp; use Math::Decimal; my $THRESHOLD = '0.000001'; my ($FILE_1, $FILE_2) = ('file1.txt', 'file2.txt'); my $contents = { }; foreach my $file ($FILE_1, $FILE_2) { foreach ( (File::Slurp::read_file($file)) ) { chomp; push @{ $contents->{$file} }, (split /\s+/, $_) } } foreach my $val_1 ( @{ $contents->{$FILE_1} } ) { my $diff = Math::Decimal::dec_sub( $val_1, ( shift @{ $contents->{$FILE_2} } ), ); my $abs_diff = Math::Decimal::dec_abs($diff); my $cmp = Math::Decimal::dec_cmp( $abs_diff, $THRESHOLD ) +; die "Files NOT equal !!\n" if ($cmp > 0); } print "Files ARE equal !!\n";