How compare two files

Daredevil-- has asked for the wisdom of the Perl Monks concerning the following question:

I'm tring to compare two files and get the equal data.
So for example I have this file (file1.txt):

1173|0040
1174|052425
1175|052634
1176|053281
1177|055876
1188|2222
1189|2002
1190|2002
1191|2002
[download]

And file2.txt file:

000|20019|0040|No Definida.
000|20034|052425|No Definida.
000|20014|052634|No Definida.
000|20031|053281|No Definida.
000|20044|055876|No Definida.
000|67022|2000|No Definida.
000|67022|2000|No Definida.
000|00019|2000|No Definida.
000|67024|2000|No Definida.
210|72059|2002|SERGIO SUAREZ LLAMAS
210|72059|2002|SERGIO SUAREZ LLAMAS
210|72059|2002|SERGIO SUAREZ LLAMAS
210|20023|2002|SERGIO SUAREZ LLAMAS
210|72057|2002|SERGIO SUAREZ LLAMAS
210|67013|2002|SERGIO SUAREZ LLAMAS
[download]

So that in the second column of the file1.txt and third column of the file2.txt is the data that could be equal, I want to compare
and get it in a new file. And I don't have idea how to do it.
My original idea was first, get the columns that are equal, for example file1.txt and file2.txt :

foreach my $line (@fuente) {
my @parts = split /\|/, $line;
print join("|", @parts[2]), "\n";
}
foreach my $line (@fuente2) {
my @parts = split /\|/, $line;
print join("|", @parts[3]), "\n";
}
[download]

But I don't know how to compare the files
Best regards.

Comment on How compare two files Select or Download Code

Replies are listed 'Best First'.
Re: How compare two files by Roy Johnson (Monsignor) on May 07, 2004 at 20:12 UTC
The short answer is: use a hash, keyed by the values that you want to test for equality. Something like: `for (@fuente) { my @parts = split /\\|/; ++$seen{$parts[1]} = $_; } for (@fuente2) { my @parts = split /\\|/; if ($seen{$parts[2]}) { print $seen{$parts[1]}; # That's the matching line from @fuente print; # That prints the line from @fuente2 } }` [download] The PerlMonk `tr///` Advocate	[reply] [d/l]
Re: How compare two files by graff (Chancellor) on May 11, 2004 at 03:06 UTC
Part of the problem could be that you haven't specified the goal completely. You say you want to compare the second column of file1 with the third column of file2 and "get a new file" where those fields are the same. In your two data file examples, "2002" shows up in three rows of file1 (but they all have distinct values in the first column), and in six rows of file2 (but three of these rows are identical, and the other three have distinct values in the second column). So what do you want the output to be? Do you want all three lines from file1 and all six lines from file2? Do you want just the lines with distinct information (maybe counting how many times each distinct line occurs)? Do you want just the distinct values from the "join" column that match in the two files (just "2002" in this case)? Or maybe, for each distinct matching value, how many times it occurs in each file (e.g. "2002 3 6")? If you want the full lines from each file that have matching values, how do you want to organize them? This is tricky, because it looks like there will be variable numbers of lines from each file for the values that match. I wrote a simple utility script to compare specific columns in two files, and print the intersection or union or difference of the column values -- I posted it here: cmpcol. Maybe it will give you some ideas on how to tackle your specific task (or maybe it will do the task you want -- I'm not sure...) I put your sample data into files as indicated, and here are some outputs from cmpcol using those two files as input: Read more... (3 kB) Hope that helps.	[reply] [d/l]
Re: How compare two files by Nkuvu (Priest) on May 07, 2004 at 20:03 UTC
~~Have you looked at Algorithm::Diff?~~ Update: Gah. Misread the question.	[reply]