in reply to Re^2: file merge
in thread file merge
But you really need to rethink the algorithm. Since you are creating a union of two sets of records, where some keys might be present in both sets, you want to build the union in a single hash, then when that's done, print the contents of the hash.
Contrary to davido's advice, I would read the old file into the hash first. Use the concatenation of the first three fields as the hash key (i.e. $key = join ",", @fields[0..2]; then use the fourth field as the hash value. (Are there more than four fields per line? If so, the hash value can be an array.)
Then read the new file in the same way: break each record into fields and concatenate the first three to make a hash key; if the hash key already exists, you have to compare field 4 against the existing hash value, and keep or replace the old hash value as appropriate; otherwise, just add the novel key/value set into the hash.
Once you reach the end of the second file, your hash is the complete and correct union, and you just print it.
Based on the code you tried, I'm assuming that you are confident about the distribution of commas in your data -- i.e. that every line of data contains exactly 3 commas (separating the four fields per line). If you really are confident that this is true and will never change, then using split is good enough.
Um, your handling of command line args seemed a bit strange; here's an untested sample of how I would approach the task:
(You should probably check to see that the sense of the value comparison is what you intended. It's so easy to invert the logic when you don't mean to.)#!/usr/bin/perl use strict; my $Usage = "Usage: $0 old_file new_file > union_file\n"; die $Usage unless ( @ARGV == 2 and -f $ARGV[0] and -f $ARGV[1] ); my %union; open IN, $ARGV[0] or die "$ARGV[0]: $!"; while (<IN>) { chomp; my @flds = split /,/; my $val = pop @flds; # assumes exactly 4 fields in every row my $key = join ',', @flds; $union{$key} = $val; } open IN, $ARGV[1] or die "$ARGV[1]: $!"; while (<IN>) { chomp; my @flds = split /,/; my $val = pop @flds; my $key = join ',', @flds; next if ( exists( $union{$key} ) and abs(($union{$key} - $val)/$union{$key}) * 100 > 1 ); $union{$key} = $val; } # union is now complete print "$_,$union{$_}\n" for ( sort keys %union );
(updated to move the close paren for the "abs()" call).
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^4: file merge
by nraymond (Initiate) on Apr 11, 2005 at 15:02 UTC |