Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

I am trying to compare three files and print the lines that are common to all three files. The columns in the files are separated by tabs. For example:
file 1:

fji01dde AIDJFMGKG

dlp02sle VMCFIJGM

cmr03lsp CKEIFJ

and so on...

file 2:

fji01dde 25 30

dlp02sle 40 50

cmr03lsp 60 70

and so on...

file 3:

AIDJFMGKG

CKEIFJ

output needs to be:

fji01dde AIDJFMGKG 25 30

cmr03lsp CKEIFJ 60 70

and so on...

I only want lines that are common in all three files.

The below code works well for the first two files, but I need to incorporate the third file. Any ideas?

#!/usr/bin/env perl use strict; my (%file1,%file2); ## Open the 1st file open(A,"file1"); while(<A>){ chomp; ## Split the current line on tabs into the @F array. my @F=split(/\t/); push @{$file1{$F[0]}},@F[1..$#F]; } ## Open the 2nd file open(B,"file2"); while(<B>){ chomp; ## Split the current line on tabs into the @F array. my @F=split(/\t/); if (defined($file1{$F[0]})) { foreach my $col (@{$file1{$F[0]}}) { print "$F[0]\t$col\t@F[1..$#F]\n"; } } }

Replies are listed 'Best First'.
Re: Compare 3 files and print matches in Perl
by Laurent_R (Canon) on Oct 26, 2015 at 00:18 UTC
    Sure, if you can do it with two files, just repeat with a third file.

    Store file 1 in a hash, with the keys being the comparison key between files. Then read file 2, and modify your hash values for each item in file 2. You could either keep in the hash items that are common to the first two files, and then use it as a test for file 3; or you could implement each hash value as a counter, and output any hash key where the counter reaches 3.

      I am new to Perl, would you show me an example of the code I would need to add to my existing code?

        Hi,

        this is a simplified version checking 3 files:

        use strict; use warnings; my %result; open my $A, "<", "file1" or die "could not open file1 $!"; while (<$A>) { chomp; my $key = (split /\t/, $_)[0]; $result{$key} = 1; } close $A; open my $B, "<", "file2" or die "could not open file2 $!"; while (<$B>) { chomp; my $key = (split /\t/, $_)[0]; $result{$key}++; } close $B; open my $C, "<", "file3" or die "could not open file3 $!"; while (<$C>) { chomp; my $key = (split /\t/, $_)[0]; if ($result{$key} == 2) { # this key has been seen in both previou +s files print "Line with $key is present in all three files\n"; } } close $C;
        The value in the %result hash is basically a counter saying how many times you've seen the key. If you find in file3 a key that has already been seen twice, then the key is present in all 3 files. Please note that this assumes that the key cannot be more than once in file2, but only rather small changes would be required to take this possibility into account.

        Update: when reading your answer, I only reread the narrative of your original post, without looking at the file samples. I assumed above that all your three files had the same structure, but only see now that they don't. A few minor changes are needed to cope with the actual structure of your files, but I guess that I still gave you the general idea of the solution.

Re: Compare 3 files and print matches in Perl
by Preceptor (Deacon) on Oct 26, 2015 at 11:58 UTC

    It's generally good form to indicate if you've Crossposted to Stack Overflow and Unix & Linux Stack Exchange to prevent duplication of effort.

    The answer I posted on Stack Overflow was:

    #!/usr/bin/env perl use strict; use warnings; use Data::Dumper; #read file1 into a hash - but invert is it's value => key instead: # 'CKEIFJ' => 'cmr03lsp', # etc. open( my $file1, '<', "file1.txt" ) or die $!; my %file1_content = map { reverse split } <$file1>; close($file1); print Dumper \%file1_content; #read file 2 - read keys, store the values. #split _2_ fields, so we keep both numbers as a substring: #e.g.: # 'cmr03lsp' => '60 70 #', open( my $file2, '<', "file2.txt" ) or die $!; my %file2_content = map { split( " ", $_, 2 ) } <$file2>; close($file2); print Dumper \%file2_content; #then iterate file 3, checking if: #file1 has a matching 'key' (but inverted - as a value) #file2 has a cross reference. open( my $file3, '<', "file3.txt" ) or die $!; while ( my $line = <$file3> ) { chomp $line; if ( $file1_content{$line} and $file2_content{ $file1_content{$line} } ) { print "$file1_content{$line} $line $file2_content{$file1_con +tent{$line}}"; } } close($file3);

    Which prints (aside from Dumper diag content):

    fji01dde AIDJFMGKG 25 30 cmr03lsp CKEIFJ 60 70

    as requested

Re: Compare 3 files and print matches in Perl
by vinoth.ree (Monsignor) on Oct 26, 2015 at 03:25 UTC
    Hi,

    If you files are sorted one, you can use linux command comm to find the common lines from all the three files,

    comm -12 fileA fileB | comm -12 - fileC
        -1 : suppress lines unique to FILE1
        -2 : suppress lines unique to FILE2
        -3 : suppress lines that appear in both files
    

    All is well. I learn by answering your questions...
A reply falls below the community's threshold of quality. You may see it by logging in.