Re^3: Comparing text files

Okay, first: update your posts by adding <code> at the beginning of the perl script, and </code> at the end of the perl script. Just do it. Likewise for sample data.

Second: when you do this:

open( FILE1, ... );
open( FILE2, ... );
open( FILE3, ... ):

while (<FILE1>) {
   ...
   while (<FILE2>) {
      ...
      while (<FILE3>) {
         ...
      }
   }
}
[download]

FILE2 and FILE3 will both reach EOF during the first iteration on (the first line read from) FILE1. So don't do that.

(You could "seek( FILE3, 0, 0 );" at the end of the while loop that reads from FILE2, and also do "seek( FILE2, 0, 0 );" at the end of the loop that reads from FILE1, but this would mean that you re-read FILE2 too many times, and you re-read FILE3 way too many times. So don't do that.)

Since FILE2 is "small" and FILE3 is probably not too big either, read them both into hash structures first to keep them in memory while you read the "large" file. When loading the hashes with data in these two files, the hash keys should be the strings you need for linking data across files, and the values should be whatever you need to keep from each file for your final output.

Third: you said

output file will have new_id from file1, old_id from file2, and zip_code3 from file3 only if state_code2 from file2 not equal to state_code3 from file3 and city_code3 from file 3 not equal to city_code1 from file1

This statement really does not make sense, unless you seriously want the "cartesian product" of all the lines in the three files. That is, supposing there are 100 lines in "largefile", 10 lines in "smallfile" and 20 lines in "addressfile", and there are some matches among the city_code and state_code values, then the condition as you phrased it would list about 99*9*19 lines of output.

Do you mean something like this instead?

OUTPUT file1:new_id, file2:old_id, file3:zip_code3
 IF file1:city_code1 DOES NOT MATCH ANY file3:city_code3
  OR ( file1:city_code1 MATCHES ONE file3:city_code3
      AND THIS file3:state_code3 DOES NOT MATCH ANY file2.state_code2 
+)
[download]

If that's not what you mean, then you really need to explain it better. Given just the snippets of sample data that you have shown, what should the output be? (If those snippets would not really produce any outputs, because everything matches up, add a row or two that would generate the intended output, and show us what the output should be.) And remember to use "code" tags.

In any case, it sounds like some sort of SQL problem, and it looks like your data came from a database (or could easily be put into a database). So maybe SQL would be the more prudent approach. (But proper use of hashes to store the relevant stuff from the two smaller files would do fine.)

Comment on Re^3: Comparing text files Select or Download Code