ganilmohan has asked for the wisdom of the Perl Monks concerning the following question:

Hi,
I have two files Source.txt and Expected.txt, I have data in both the files, data means 'multiple lines of text'. Now I have to compare these two files....is the below code correct?

open(DAT_Source, $Source_data_file) || die("Could not open file!"); @raw_data_source=<DAT_Source>; open(DAT_Expected, $Expected_data_file) || die("Could not open file!") +; @raw_data_Expected=<DAT_Expected>; if (@raw_data_source == @raw_data_Expected) { print "DATA MATCHED!"; } else { print "DATA NOT MATCHED!"; } close(DAT_Source); close(DAT_Expected);

Replies are listed 'Best First'.
Re: Two files comparasion
by bart (Canon) on Aug 01, 2008 at 09:32 UTC
    To answer your immediate question, instead of trying to guess what you really want to do: no, it's not quite correct. When you use == on arrays, you are comparing them in scalar context, and thus, you're numerically comparing the number of entries in each array. So yes, it'll return true if the files are the same, but not only then! All that is needed is that they contain the same number of lines.

    Since perl 5.10.x, Perl has a new operator, the smart match operator, ~~ (see this tutorial), which might actually work in your case, and compare line by line. I don't know, I've not used it much, yet. Looking at that tutorial, it looks like it ought to.

    There must be other solutions, for if you need it to work on an older perl. For example, take a look at how some Test modules do it, for example, Test::Deep, where you can simply compare arrays for equality, with the function cmp_deeply.

    A really simple solution is to load the whole files into two scalars, instead of into arrays, and compare them as strings. Just set $/ to undef and you read the whole file as one line.

    local $/; #sets to undef for the current scope open(DAT_Source, $Source_data_file) || die("Could not open file!"); $raw_data_source=<DAT_Source>; open(DAT_Expected, $Expected_data_file) || die("Could not open file!") +; $raw_data_Expected=<DAT_Expected>; if ($raw_data_source eq $raw_data_Expected) { print "DATA MATCHED!"; } else { print "DATA NOT MATCHED!"; }
    That really requires very little change in your code.

    That's one of the things I really love about Perl: you can often completely change how a piece of code works, by just changing a few thingies here and there.

      changing even less, he could have changed
      if( @raw_data_source == @raw_data_Expected )
      for
      if( "@raw_data_source" eq "@raw_data_Expected" )
      and get the (presumably) desired comparison :-)
      Please, use strict; use warnings; !!! :-)
      []s, HTH, Massa (κς,πμ,πλ)
        if( "@raw_data_source" eq "@raw_data_Expected" )

        That does work, but if $" eq $/ there could be cases where you get false positives. Luckily that's the default, but I feel it's worth to mention nonetheless.

        Update: Uhm, not entirely sure. But there's no need to split the lines into arrays unless you process it line by line.

Re: Two files comparasion
by Corion (Patriarch) on Aug 01, 2008 at 08:24 UTC

    What did you try? What problems did you encounter? Does Perl return any errors?

    Maybe you want to use Algorithm::Diff?

Re: Two files comparasion
by dHarry (Abbot) on Aug 01, 2008 at 08:35 UTC

    As usual it depends on what * exactly * you want to do. For example what means equal, i.e. how do you compare. Do you need an exact match or can you ignore whitespace. etc.

    On Unix/Linux you could use the diff command:-) But based on your files names I guess you run Windows?

    You might want to give Comparing arrays with text contents a look too.

    Hope this helps.

Re: Two files comparasion
by Anonymous Monk on Aug 01, 2008 at 08:34 UTC
    perldoc File::Compare
Re: Two files comparasion
by moritz (Cardinal) on Aug 01, 2008 at 08:42 UTC
Re: Two files comparison
by swampyankee (Parson) on Aug 01, 2008 at 17:13 UTC

    If you're checking to see if two files match exactly, there's no need to read either into memory; you can use something like Digest::file, and compare the check sums for equality1. In the (rather likely) event that you need to determine the actual differences between the files (as in diff or even Windows' fc, your first route should be to use a module like File::Compare.


    1 Some of the checksum algorithms will give false positives, in that identical checksums may be produced by different files. I believe "Algorithm 1" used by sum may be especially prone to this.


    Information about American English usage here and here. Floating point issues? Please read this before posting. — emc