arunsriniv has asked for the wisdom of the Perl Monks concerning the following question:

Hello, I am trying to compare specific columns in 2 files so see whether the contents in one file (File1.txt) which is a master file is exact with another file (File2.txt) which is a subset of File1.

For Example: File1.txt (Master file)

Status1,Name1,Source1,Destination1

Status2,Name2,Source2,Destination2

Status3,Name3,Source3,Destination3

.....

.....

File2.txt (Subset of File1.txt)

Status1,Name1,Source1,Destination1

Status2,Name2,Source2,Destination2

Now, I am trying to compare only specific columns in File1 with File2 say ignore Status column and compare only the remaining columns for exactness. Any ideas are appreciated? My current code is to compare all columns between 2 files for exactness.

my @cur_data=<FILE1>; close (FILE1); my @org_data=<FILE2>; close (FILE2); foreach $org_data(@org_data) { $flag= 1; foreach $cur_data(@cur_data) { chomp ($cur_data); chomp ($org_data); if ( $cur_data eq $org_data ) { $flag= 0; last; } } if ($flag == 1) { print " \n $org_data -->failed\n"; last; } } return $flag;

Replies are listed 'Best First'.
Re: Comparing specific columns from 2 files
by Laurent_R (Canon) on Jul 29, 2015 at 08:23 UTC
    Assuming you need to compare only the three last fields of your input files and that these three fields define a unique key for comparison, you could:
    1. read file 1 and store the 3 fields of interest into a hash (the three fields concatenated as a key, with a value of, say, 1);
    2. close file 1;
    3. read file 2, grab the three fields and check if that key exists in the hash.
    Update:

    The following is Perl pseudo-code to do that:

    my %hash; # first open file1 while (my $line1 = <$file1>) { my $key = join ";", (split /,/, $line1)[1..3]; $hash{$key} = 1; } close $file1; open my $file2, "<", "file2.txt" of die "could not open file2.txt $!"; while (my $line2 = <$file1>) { my $key = join ";", (split /,/, $line2)[1..3]; if (exists $hash{$key}) { # do something } else { # do something else } }
Re: Comparing specific columns from 2 files
by 1nickt (Canon) on Jul 29, 2015 at 15:33 UTC

    Hi arunsriniv. You've already been given suggestions about how to accomplish your goal. What you have is not the right way to go about it. But here are some comments on the code you posted.

    First, you should always place

    use strict; use warnings;

    at the top of your program. This tells Perl to point out errors in your code before it even runs. In your case you are not declaring your variables within the scope they are used in, which makes them global. This is bad practice.

    You correctly execute chomp() on $cur_data within the inner loop, but you don't need to chomp( $org_data ) each time; that should be in the outer loop. It's best to chomp() lines from a file as you read them.

    It would be better to read in the lines from the original file (and store them in a hash as others have said), and then read the current file in one line at a time, chomp()ing and comparing as you go. No need to put all the lines in an array and no need for a flag; just print an error and call last() or die() or whatever when a comparison fails.

    The way forward always starts with a minimal test.
Re: Comparing specific columns from 2 files
by poj (Abbot) on Jul 29, 2015 at 07:24 UTC

    Is there a column (Name perhaps) or combination of columns that uniquely identifies each record in the files ?

    poj
Re: Comparing specific columns from 2 files
by Anonymous Monk on Jul 29, 2015 at 07:24 UTC