Ms.Ranjan has asked for the wisdom of the Perl Monks concerning the following question:

Hi My problem is to read two tab delimited files compare the content of the files and display the similar contents and the different ones seperately in two files.. i am new to perl and i have written a few lines of code..please kindly direct me..
#!/usr/bin/perl open FH1, "F1.txt"; open FH2, "F2.txt"; my @f1= <FH1>; my @f2= <FH2>; for my $x(0..4) { for each($f1[$x1] eq $f2[$x2]) { print $f1[$x]; } } close(FH1); close(FH2); exit;
F1.txt name value n1 001 n2 002 n3 003 n4 004 F2.txt name value n5 005 n2 002 n3 003 n8 008

Replies are listed 'Best First'.
Re: comparing contents of the file
by pc88mxer (Vicar) on Jun 04, 2008 at 03:21 UTC
Re: comparing contents of the file
by graff (Chancellor) on Jun 04, 2008 at 04:09 UTC
    Have you learned about hashes yet? You are in need of an algorithm, and the one that you need will most likely involve the use of a hash. Something like this:
    declare a hash (e.g. "my %lines." open the first input file while reading the first input one line at a time use the line as a hash key and assign "1" as the hash value close the file open output.file1 to hold "matching lines" open output.file2 to hold "distinct lines" open the second input file while reading the second input one line at a time if the current line exists as a hash key and the hash value is "1" print this line to the "matching lines" file increment the hash value otherwise print this line to the "distinct lines" file (maybe include the f +ile name) having read all input, now loop over all the keys of the hash if the hash value assigned to this key is still "1" print this hash key to the "distinct lines" file (maybe include t +he file name)
    I think you'll find that the actual perl code for that will be somewhat shorter than what I've written, but (since I assume this is homework) you should write the code. There are lots of places to read about hashes in Perl. Have you checked the Tutorials here at the Monastery?
      Thankyou...Hi i have written the code as per your algorithm
      #!/usr/bin/perl use strict; use warnings; my %lines; open F1, "File.txt"; while(my $result= <F1>) { $lines{$result}=1; } close(F1); open (OF1, ">match.txt"); open (OF2, ">diff.txt"); open F2,"File2.txt"; while(<F2>) { if ($lines{$_}==1) { print OF1 $lines; $lines++; } els { print OF2 $lines; } } close OF1; close OF2;
      and i am getting this error: Global symbol "$lines" requires explicit package name at line.pl line 21,22,26. i am gng throu the tutorials..
        You have declared a hash, my %lines;, but then you are trying to use that variable as if it were a scalar in this line:
        print OF1 $lines;

        You probably want:

        print OF1 $lines{$_};

        Same for OF2.

        Also, "els" is a typo: should be "else".

        $lines++; is also a problem.

Re: comparing contents of the file
by GrandFather (Saint) on Jun 04, 2008 at 04:01 UTC

    We strongly recommend that you use strictures (use strict; use warnings;) at the start of all the Perl you write. They give you early warning of a wide range of issues.

    We also like it if you tell us up front if you are doing homework or are doing this as a learning exercise (this smells like a homework question) so that we can tailor our replies to help you learn.

    Your first attempt is almost ok for small files, although the syntax is not correct and there is a problem. A syntax corrected version would look somewhat like:

    use strict; use warnings; my $F1 = <<F1; name value n1 001 n2 002 n3 003 n4 004 F1 my $F2 = <<F2; name value n5 005 n2 002 n3 003 n8 008 F2 open FH1, '<', \$F1 or die "Failed to open F1: $!"; open FH2, '<', \$F2 or die "Failed to open F2: $!"; my @f1= <FH1>; my @f2= <FH2>; for my $f1Line (@f1) { for my $f2Line (@f2) { if ($f1Line eq $f2Line) { print "Same: $f1Line"; } else { print "Diff: $f1Line"; } } } close(FH1); close(FH2);

    which prints:

    Diff: name value Diff: name value Diff: name value Diff: name value Diff: name value Diff: n1 001 Diff: n1 001 Diff: n1 001 Diff: n1 001 Diff: n1 001 Diff: n2 002 Diff: n2 002 Same: n2 002 Diff: n2 002 Diff: n2 002 Diff: n3 003 Diff: n3 003 Diff: n3 003 Same: n3 003 Diff: n3 003 Diff: n4 004 Diff: n4 004 Diff: n4 004 Diff: n4 004 Diff: n4 004

    which is not quite what you want and becomes horribly slow when the files get big.

    Generally however this is a tricky problem to solve because good solutions require knowledge of the size and nature of the contents of the files. A common key to the solution involves using hashes (see perldata) to keep information about data that has been seen before.

    At this point you can either play with your current code, or learn about hashes and possibly come back for more instruction.


    Perl is environmentally friendly - it saves trees