Re: file compare and populate

Using unix, the comm utility already does this (options: -1 suppress lines unique to file 1 and -3 suppress common lines, leaving only column 2 as output: the lines unique to file 2).

comm -13 file1 file2 > file3
[download]

To address the OP Perl, use regexp matching to isolate the first column - which is the correct key for the hash. It also means you then don't need to chop or chomp everything only to have to put the \n back. Also @names doesn't function as an array coercion of %names but as a completely separate piece of storage.

The following is now updated to sort the output.

open FIRST,"file1" or die "$!: file1\n";
open LAST,"file2" or die "$!: file2\n";
open NEW,">file3" or die "$!: file3\n";
my %names = ();
my %n2 = ();
while (<FIRST>) {
   /^(\S+)/;
   $names{$1} = 1;
}
close FIRST;
while (<LAST>) {
   /^(\S+)/;
   $names{$1} or $n2{$1} = $_;
}
close LAST;
print NEW $n2{ $col1 } for my $col1 ( sort keys %n2 );
close NEW;
[download]

-M

Free your mind

Comment on Re: file compare and populate Select or Download Code

Replies are listed 'Best First'.
Re^2: file compare and populate by gu (Beadle) on Dec 05, 2005 at 11:03 UTC
This code only prints the new items from file2. The main problem from Anonymous Monk's code was that it used the complete lines as hash keys... Anonymous Monk probably needs something smarter, or more perlish, than the following snippet : `open FIRST,"file1" or die "Can't open file1: $!\n"; open LAST,"file2" or die "Can't open file2: $!\n"; my %names ; my $write_new = 0 ; while (<FIRST>) { /(\w) (\d)/ ; $names{$1} = $2 ; } close FIRST; while (<LAST>) { /(\w) (\d)/ ; if (!defined($names{$1})) { $write_new++ ; $names{$1} = $2 ; } } close LAST; if ($write_new) { open NEW,">","file3" or die "Can't open file3: $!\n"; foreach (sort keys %names) { print NEW "$_ $names{$_}\n" ; } close NEW; }` [download] Gu Updated to avoid unnecessary opening of new file.	[reply] [d/l]
A reply falls below the community's threshold of quality. You may see it by logging in.
A reply falls below the community's threshold of quality. You may see it by logging in.