in reply to How compare two files

Part of the problem could be that you haven't specified the goal completely. You say you want to compare the second column of file1 with the third column of file2 and "get a new file" where those fields are the same. In your two data file examples, "2002" shows up in three rows of file1 (but they all have distinct values in the first column), and in six rows of file2 (but three of these rows are identical, and the other three have distinct values in the second column).

So what do you want the output to be? Do you want all three lines from file1 and all six lines from file2? Do you want just the lines with distinct information (maybe counting how many times each distinct line occurs)? Do you want just the distinct values from the "join" column that match in the two files (just "2002" in this case)? Or maybe, for each distinct matching value, how many times it occurs in each file (e.g. "2002 3 6")?

If you want the full lines from each file that have matching values, how do you want to organize them? This is tricky, because it looks like there will be variable numbers of lines from each file for the values that match.

I wrote a simple utility script to compare specific columns in two files, and print the intersection or union or difference of the column values -- I posted it here: cmpcol. Maybe it will give you some ideas on how to tackle your specific task (or maybe it will do the task you want -- I'm not sure...)

I put your sample data into files as indicated, and here are some outputs from cmpcol using those two files as input:

# first example: just print matching "key" values: $ cmpcol -d '\|' -i file1:2 file2:3 0040 052425 052634 053281 055876 2002 # print full lines from file1 that match keys in file2 $ cmpcol -d '\|' -i -l1 file1:2 file2:3 1173|0040 1174|052425 1175|052634 1176|053281 1177|055876 1189|2002 1190|2002 1191|2002 # print full lines of file2 that match keys in file1: $ cmpcol -d '\|' -i -l2 file1:2 file2:3 000|20019|0040|No Definida. 000|20034|052425|No Definida. 000|20014|052634|No Definida. 000|20031|053281|No Definida. 000|20044|055876|No Definida. 210|72059|2002|SERGIO SUAREZ LLAMAS 210|72059|2002|SERGIO SUAREZ LLAMAS 210|72059|2002|SERGIO SUAREZ LLAMAS 210|20023|2002|SERGIO SUAREZ LLAMAS 210|72057|2002|SERGIO SUAREZ LLAMAS 210|67013|2002|SERGIO SUAREZ LLAMAS # relate full matching lines from both files # (note extra lines from file2 at bottom, matching "2002"): $ cmpcol -d '\|' -i -lb file1:2 file2:3 1173|0040:<>:000|20019|0040|No Definida. 1174|052425:<>:000|20034|052425|No Definida. 1175|052634:<>:000|20014|052634|No Definida. 1176|053281:<>:000|20031|053281|No Definida. 1177|055876:<>:000|20044|055876|No Definida. 1189|2002:<>:210|72059|2002|SERGIO SUAREZ LLAMAS 1190|2002:<>:210|72059|2002|SERGIO SUAREZ LLAMAS 1191|2002:<>:210|72059|2002|SERGIO SUAREZ LLAMAS :<>:210|20023|2002|SERGIO SUAREZ LLAMAS :<>:210|72057|2002|SERGIO SUAREZ LLAMAS :<>:210|67013|2002|SERGIO SUAREZ LLAMAS # same as previous, but only use uniq lines from file2: $ sort -u file2 | cmpcol -d '\|' -i -lb file1:2 stdin:3 1173|0040:<>:000|20019|0040|No Definida. 1174|052425:<>:000|20034|052425|No Definida. 1175|052634:<>:000|20014|052634|No Definida. 1176|053281:<>:000|20031|053281|No Definida. 1177|055876:<>:000|20044|055876|No Definida. 1189|2002:<>:210|20023|2002|SERGIO SUAREZ LLAMAS 1190|2002:<>:210|67013|2002|SERGIO SUAREZ LLAMAS 1191|2002:<>:210|72057|2002|SERGIO SUAREZ LLAMAS :<>:210|72059|2002|SERGIO SUAREZ LLAMAS
I wrote cmpcol to allow a lot of flexibility in column delimiters -- the string supplied with the "-d" option is passed directly as a regex to "split()", with all magic characters enabled (so in this case, I have to backslash the vertical bar character, to treat it as a literal, not magic).

When full lines are output from both files, the string ":<>:" is used to mark the division between the two files, because this is generally bound to be distinctive and unmistakable. (Maybe I should add an option to control that, but you can just pipe the output through "sed" or a perl one-liner to make it whatever you want.)

Hope that helps.