Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hi,
I'm having trouble with the following. I'm relatively new to Perl so any help would be appreciated

I have two files e.g.

file 1 ------ a 0 c 1 d 1 e 3 f 0 file 2 ------ a 0 b 0 c 0 d 0 e 0 f 0
I need to compare column 1 of file2 with column1 of file1
If file2 has an extra record that isn't in file1 then a new file needs to be created which is a copy of file1 but has this new record in (in sorted order)
e.g.
file3 ----- a 0 b 0 c 1 d 1 e 3 f 0
The code I have for this so far (which doesn't work) is
open FIRST,"file1" or die "Can't open file1: $!\n"; open LAST,"file2" or die "Can't open file2: $!\n"; open NEW,">file3" or die "Can't open file3: $!\n"; chomp (my @last=<LAST>); my %names; @names {<FIRST>} = (); foreach my $name (@last) { next if exists $names { $name }or print NEW "$name\n"; } close FIRST; close LAST; close NEW;
This is just creating file3 as a complete copy of file2.

How do I get it to just compare the first element of each record ?
How do I get the new record inserted in the correct order in the new file ?
Any help appreciated, I'm really struggling with the logic on this one

Replies are listed 'Best First'.
Re: file compare and populate
by Moron (Curate) on Dec 05, 2005 at 10:52 UTC
    Using unix, the comm utility already does this (options: -1 suppress lines unique to file 1 and -3 suppress common lines, leaving only column 2 as output: the lines unique to file 2).
    comm -13 file1 file2 > file3
    To address the OP Perl, use regexp matching to isolate the first column - which is the correct key for the hash. It also means you then don't need to chop or chomp everything only to have to put the \n back. Also @names doesn't function as an array coercion of %names but as a completely separate piece of storage.

    The following is now updated to sort the output.

    open FIRST,"file1" or die "$!: file1\n"; open LAST,"file2" or die "$!: file2\n"; open NEW,">file3" or die "$!: file3\n"; my %names = (); my %n2 = (); while (<FIRST>) { /^(\S+)/; $names{$1} = 1; } close FIRST; while (<LAST>) { /^(\S+)/; $names{$1} or $n2{$1} = $_; } close LAST; print NEW $n2{ $col1 } for my $col1 ( sort keys %n2 ); close NEW;

    -M

    Free your mind

      This code only prints the new items from file2. The main problem from Anonymous Monk's code was that it used the complete lines as hash keys...

      Anonymous Monk probably needs something smarter, or more perlish, than the following snippet :
      open FIRST,"file1" or die "Can't open file1: $!\n"; open LAST,"file2" or die "Can't open file2: $!\n"; my %names ; my $write_new = 0 ; while (<FIRST>) { /(\w*) (\d*)/ ; $names{$1} = $2 ; } close FIRST; while (<LAST>) { /(\w*) (\d*)/ ; if (!defined($names{$1})) { $write_new++ ; $names{$1} = $2 ; } } close LAST; if ($write_new) { open NEW,">","file3" or die "Can't open file3: $!\n"; foreach (sort keys %names) { print NEW "$_ $names{$_}\n" ; } close NEW; }

      Gu

      Updated to avoid unnecessary opening of new file.
      A reply falls below the community's threshold of quality. You may see it by logging in.
      A reply falls below the community's threshold of quality. You may see it by logging in.
Re: file compare and populate
by vennirajan (Friar) on Dec 05, 2005 at 12:43 UTC
    Hi Perl soldiers!,

    This code will create a new file; only, if there is any new records in the "file2".

    #!/usr/bin/perl -w use strict; open FIRST,"file1" or die "Can't open file1: $!\n"; open LAST,"file2" or die "Can't open file2: $!\n"; my ( $Flag, @Last, @First, %Hash ); chomp ( @First = <FIRST> ); chomp ( @Last= <LAST> ) ; $Flag = 0; foreach my $record ( @First ) { $record =~/^(\S*) (\d*)$/; $Hash{$1} = $2 ; } foreach my $record ( @Last ) { $record =~ /^(\S*) (\d*)$/; unless ( exists $Hash{$1} ) { $Flag++; $Hash{$1} = $2; } } if ( $Flag ) # This condition checking avoids unwanted creatio +n of the new file. Hope it will add value to the code. { open NEW,">file3" or die "Can't open file3: +$!\n"; map { print NEW "$_ $Hash{$_}\n"} sort keys +%Hash ; close NEW; } close FIRST; close LAST;
    Regards,
    S.Venni Rajan.
    "A Flair For Excellence"
        - BK Systems.
      See my other comment. There are cases when file1 includes records not in file2. In this case I'd need to remove the record which isn't present in file2 ?
Re: file compare and populate
by Moron (Curate) on Dec 05, 2005 at 15:15 UTC
    In case this is a database replication requirement, where you have yesterday's dump of a table from say another company's system and today's dump of that table from the same place and want to extract and apply the changes that took place in between, the full story of that using comm alone was (assuming table dumps come sorted, which can usually be arranged at dump time):

    comm -23 table.yest table.today > table.deletes

    comm -13 table.yest table.today > table.inserts

    Each update is converted into one delete and one insert, so the deletes have to be processed first.

    -M

    Free your mind