Compare Two Text Files Based on a String

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

I know that there are a couple of other posts out there but I have been unable to decipher enough to help me on my issue. I basically have two text files that have fields in them and I want to match on those fields to generate a third text file when the fields match. So for example

File 01

School District State,School District Name, School District Key, Number of Students, Grade AK, Juneau School District, 21, 100, 4 AK, Juneau School District, 21, 98, 2

File 02

School District State,School District Name, School District Key, Grade, Race, CountofRace AK, Juneau School District, 21, 4, I, 12 AK, Juneau School District, 21, 2, P, 47

File 03 (Output File)

School District State,School District Name, School District Key, Number of Students, Grade, Race, CountofRace AK, Juneau School District, 100, 21, 4, I, 12 AK, Juneau School District, 98, 21, 2, P, 47

The following only loops through the first record of the first file and then loops through all the records in the second file. What I would like to do is have the first file loop through all the records in the second file and output to the third file and then do the second record in the first file and on until it goes through all the records and checks them against the second file:

#! /usr/bin/perl

$file1path = "c:/Documents and Settings/bob/My Documents/Research/";

$file1name = "AK_Schools_Student_Race_By_Grade_2010091001.csv";

$file1= $file1path.$file1name;

$file2path= "c:/Documents and Settings/bob/My Documents/Research/";

$file2name = "AK_School_Student_Pop_By_Grade_2010091001.csv";

$file2=$file2path.$file2name;

$file3path= "c:/Documents and Settings/bob/My Documents/Research/Testi
+ng";

$file3name = "Combined_File.csv";

$file3 = $file3path.$file3name;

$counter = 0;
$counter2 = 0;

open (f1, $file1) ;

open (f2, $file2) ;

open (f3 , ">>$file3");    

open( f4, ">>c:/Documents and Settings/bob/My Documents/Research/Testi
+ng/looping_file.csv");

        while (<f1>)
            {
            chomp;
            
            @f1_array=split (",") ;
    
            $f1_state = @f1_array[0];
            $f1_district_name =@f1_array[1];
            $f1_district_key = @f1_array[2];
            $f1_school_name=@f1_array[3];
            $f1_school_key=@f1_array[4];
            $f1_site_guid=@f1_array[5];
            $f1_student_grade=@f1_array[6];
            $f1_count_of_students=@f1_array[7];
            $counter=$counter+1;
            #print "$f1_state , \n";
            #print "$f1_district_name , \n";
            #print "$f1_district_key , \n";
            #print "File 01 - $counter, \n";
            
        while (<f2>)
            {
            chomp;
            
            @f2_array=split (",");
        
            $f2_state = @f2_array[0];
            $f2_district_name =@f2_array[1];
            $f2_district_key = @f2_array[2];
            $f2_school_name=@f2_array[3];
            $f2_school_key=@f2_array[4];
            $f2_site_guid=@f2_array[5];
            $f2_student_grade=@f2_array[6];
            $f2_student_race=@f2_array[7];
            $counter2=$counter2+1;
                                    $f2_student_race_count=@f2_array[8
+];
        
            #print f4 "$f2_state , ";
            #print f4 "$f1_state , \n";
            #print f4 "$f2_district_name , ";
            #print f4 "$f1_district_name , \n";
            #print f4 "File 02 District Key - $f2_district_key , ";
            #print f4 "File 01 District Key - $f1_district_key, \n";
            #print f4 "$f2_school_name, ";
            #print f4 "$f1_school_name, \n";
            #print f4 "File 01 - $counter,";
            #print f4 "File 02 - $counter2, \n";
            
            
            
            }
            if (($f1_state eq $f2_state) && ($f1_district_key eq $f2_d
+istrict_key))
            {
            
                print  f3 "$f1_state , ";
                print  f3 "$f1_district_name ,";
                print  f3 "$f2_district_name ,";
                print  f3 "$f1_district_key ,";
                print  f3 "$f2_district_key ,";
                print  f3 "$f1_school_name , ";
                print  f3 "$f1_school_key , ";
                print  f3 "$f1_site_guid ,";
                print  f3 "$f1_student_grade , ";
                print  f3 "$f1_count_of_students ,";
                print  f3 "$f2_student_race , \n";
            }    
            } 
                
                                
close (f1);
close (f2);
[download]

Comment on Compare Two Text Files Based on a String Download Code

Replies are listed 'Best First'.
Re: Compare Two Text Files Based on a String by CountZero (Bishop) on Sep 10, 2010 at 19:23 UTC
This really is a classical relational database problem: you have two tables and you must join them on a key-value. My take on this matter would be to use DBD::CSV. This module can work directly with regular (for any reasonable definition of "regular") CSV-file. CountZero A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James	[reply]
Re: Compare Two Text Files Based on a String by CountZero (Bishop) on Sep 11, 2010 at 11:00 UTC
As I am in a good mood, I have cobbled together a quick example. Given these two CSV-files: file1 `key1,key2,data1,data2,data3 A,1,A1d1,A1d2,A1d3 A,2,A2d1,A2d2,A2d3 B,1,B1d1,B1d2,B1d3` [download] and file2 `key1,key2,data4,data5 A,1,A1d4,A1d5 A,2,A2d4,A2d5 B,1,B1d4,B1d5 A,2,A2d4bis,A2d5bis A,2,A2d4ter,A2d5ter B,1,B1d4bis,B1d5bis` [download] The following program combines the matching lines from both files: `use strict; use warnings; use 5.012; use DBD::CSV; my $dbh = DBI->connect( "dbi:CSV:f_dir=.", undef, undef, { FetchHashKeyName => "NAME_lc", RaiseError => 1, PrintError => 1, + } ) or die $DBI::errstr; my $query = 'SELECT key1, key2, data1, data2, data3, data4, data5 FROM file1 JOIN +file2 WHERE file1.key1 = file2.key1 AND file1.key2 = file2.key2'; my $sth = $dbh->prepare($query); $sth->execute(); while ( my @row = $sth->fetchrow_array ) { say join '\|', @row; } $sth->finish();` [download] Output: `A\|1\|A1d1\|A1d2\|A1d3\|A1d4\|A1d5 A\|2\|A2d1\|A2d2\|A2d3\|A2d4\|A2d5 A\|2\|A2d1\|A2d2\|A2d3\|A2d4bis\|A2d5bis A\|2\|A2d1\|A2d2\|A2d3\|A2d4ter\|A2d5ter B\|1\|B1d1\|B1d2\|B1d3\|B1d4\|B1d5 B\|1\|B1d1\|B1d2\|B1d3\|B1d4bis\|B1d5bis` [download] CountZero A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James	[reply] [d/l] [select]