how to find common and not common lines in 2 files?

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Helo fellow monks!
I have started using Perl a couple of weeks ago, and I am now faced with my first serious problem.
I have 2 files let's say with words,namely

FILE1
one
two
three
four
five
six
seven
#############
FILE2
<nine>
two
eleven
twenty
one
thirty
forty
[download]

Waht I want is to create 3 files, the FIRST will store only the words that appear in FILE1,
the SECOND will store only the words that appear in FILE2 and the third will store the common words.
I know how to open files and create arrays or hashes with the words they contain, and I have created 2 hashes one for each file.
What I don't know is how can I compare the 2 hashes and find the common and not common words so that I print them in the respective output files.
Thank you in advance!

Comment on how to find common and not common lines in 2 files? Download Code

Replies are listed 'Best First'.
Re: how to find common and not common lines in 2 files? by Roy Johnson (Monsignor) on Aug 21, 2007 at 21:08 UTC
So you want to go through hash1 and separate what's common from what's unique. Then you want to find the unique things in hash2. `my (@common, @uniq1, @uniq2); for (keys %hash1) { if (exists $hash2{$_}) { push @common, $_; delete $hash2{$_}; # All that will be left in hash2 is what wasn' +t in hash1 } else { push @uniq1, $_ } } @uniq2 = keys %hash2;` [download] Caution: Contents may have been coded under pressure.	[reply] [d/l]
Re: how to find common and not common lines in 2 files? by akho (Hermit) on Aug 21, 2007 at 22:47 UTC
Better done using a single hash (didn't test this, use at your own risk. Should work, though): use strict; use warnings; my %in_files; open(my $f1, '<', 'FILE1') or die "can't open FILE1: $!\n"; while (<$f1>) { $in_files{$_} .= '1'; } open(my $f2, '<', 'FILE2') or die "can't open FILE2: $!\n"; while (<$f2>) { $in_files{$_} .= '2'; } open(my $common, '>', 'common_lines') or die "can't open common_lines: + $!\n"; open(my $u1, '>', 'unique_1') or die "can't open unique_1: $!\n"; open(my $u2, '>', 'unique_2') or die "can't open unique_2: $!\n"; for (keys %in_files) { if ($in_files{$_} =~ m/12/) { print $common $_ } else if ($in_files{$_} =~ m/1/) { print $u1 $_ } else { print $u2 $_ } } [download]	[reply] [d/l]
Re: how to find common and not common lines in 2 files? by toolic (Bishop) on Aug 21, 2007 at 21:07 UTC
There was a recent node which addressed a similar problem. See Erase entries from files. If you use nix, you could just use comm*: `sort -u FILE1 > FILE1.sorted sort -u FILE2 > FILE2.sorted comm -23 FILE1.sorted FILE2.sorted > FIRST comm -13 FILE1.sorted FILE2.sorted > SECOND comm -12 FILE1.sorted FILE2.sorted > common` [download]	[reply] [d/l]
Re: how to find common and not common lines in 2 files? by dogz007 (Scribe) on Aug 21, 2007 at 21:27 UTC
Here's a one liner (at least the real work is done in one line) that will get the job done. It ties all three files to arrays and then greps through them to find the common ones. `use strict; use Tie::File; tie my @f1, 'Tie::File', 'file1.txt' or die; tie my @f2, 'Tie::File', 'file2.txt' or die; tie my @f3, 'Tie::File', 'file3.txt' or die; @f1 = grep { my $word = $_; my $size = $#f2; @f2 = grep { if ($_ eq $word) { push @f3, $_; 0 } else { 1 } } @f2; $size == $#f2; } @f1;` [download] Outputs the following for your example files: file1.txt `three four five six seven` [download] file2.txt `nine eleven twenty thirty forty` [download] file3.txt `one two` [download]	[reply] [d/l] [select]
Re: how to find common and not common lines in 2 files? by misc (Friar) on Aug 21, 2007 at 21:45 UTC
Since you did say you started with perl a few weeks ago, I'd like to give you just a hint. Why don't you read the second file line by line and ... Besides, as this was your question, you could find common and uncommon keys of two hashes by using a foreach loop, iterating over the keys of hash a and testing if it's present in hash b. The hash category in perlfunc should also help you. Did you miss exists ? michael	[reply]