Daredevil-- has asked for the wisdom of the Perl Monks concerning the following question:

I'm tring to compare two files and get the equal data.
So for example I have this file (file1.txt):
1173|0040 1174|052425 1175|052634 1176|053281 1177|055876 1188|2222 1189|2002 1190|2002 1191|2002
And file2.txt file:
000|20019|0040|No Definida. 000|20034|052425|No Definida. 000|20014|052634|No Definida. 000|20031|053281|No Definida. 000|20044|055876|No Definida. 000|67022|2000|No Definida. 000|67022|2000|No Definida. 000|00019|2000|No Definida. 000|67024|2000|No Definida. 210|72059|2002|SERGIO SUAREZ LLAMAS 210|72059|2002|SERGIO SUAREZ LLAMAS 210|72059|2002|SERGIO SUAREZ LLAMAS 210|20023|2002|SERGIO SUAREZ LLAMAS 210|72057|2002|SERGIO SUAREZ LLAMAS 210|67013|2002|SERGIO SUAREZ LLAMAS
So that in the second column of the file1.txt and third column of the file2.txt is the data that could be equal, I want to compare
and get it in a new file. And I don't have idea how to do it.
My original idea was first, get the columns that are equal, for example file1.txt and file2.txt :
foreach my $line (@fuente) { my @parts = split /\|/, $line; print join("|", @parts[2]), "\n"; } foreach my $line (@fuente2) { my @parts = split /\|/, $line; print join("|", @parts[3]), "\n"; }
But I don't know how to compare the files
Best regards.

Replies are listed 'Best First'.
Re: How compare two files
by Roy Johnson (Monsignor) on May 07, 2004 at 20:12 UTC
    The short answer is: use a hash, keyed by the values that you want to test for equality. Something like:
    for (@fuente) { my @parts = split /\|/; ++$seen{$parts[1]} = $_; } for (@fuente2) { my @parts = split /\|/; if ($seen{$parts[2]}) { print $seen{$parts[1]}; # That's the matching line from @fuente print; # That prints the line from @fuente2 } }

    The PerlMonk tr/// Advocate
Re: How compare two files
by graff (Chancellor) on May 11, 2004 at 03:06 UTC
    Part of the problem could be that you haven't specified the goal completely. You say you want to compare the second column of file1 with the third column of file2 and "get a new file" where those fields are the same. In your two data file examples, "2002" shows up in three rows of file1 (but they all have distinct values in the first column), and in six rows of file2 (but three of these rows are identical, and the other three have distinct values in the second column).

    So what do you want the output to be? Do you want all three lines from file1 and all six lines from file2? Do you want just the lines with distinct information (maybe counting how many times each distinct line occurs)? Do you want just the distinct values from the "join" column that match in the two files (just "2002" in this case)? Or maybe, for each distinct matching value, how many times it occurs in each file (e.g. "2002 3 6")?

    If you want the full lines from each file that have matching values, how do you want to organize them? This is tricky, because it looks like there will be variable numbers of lines from each file for the values that match.

    I wrote a simple utility script to compare specific columns in two files, and print the intersection or union or difference of the column values -- I posted it here: cmpcol. Maybe it will give you some ideas on how to tackle your specific task (or maybe it will do the task you want -- I'm not sure...)

    I put your sample data into files as indicated, and here are some outputs from cmpcol using those two files as input:

    Hope that helps.

Re: How compare two files
by Nkuvu (Priest) on May 07, 2004 at 20:03 UTC