comment on

Well, it's really good that you gave that a try. The only thing that was glaringly bad: don't assign the output of "sort" to a hash (first, third, etc become keys, second, fourth, etc become values), and don't ever use "shift" on a hash. You better study some more about hashes vs. arrays.

But you really need to rethink the algorithm. Since you are creating a union of two sets of records, where some keys might be present in both sets, you want to build the union in a single hash, then when that's done, print the contents of the hash.

Contrary to davido's advice, I would read the old file into the hash first. Use the concatenation of the first three fields as the hash key (i.e. $key = join ",", @fields[0..2]; then use the fourth field as the hash value. (Are there more than four fields per line? If so, the hash value can be an array.)

Then read the new file in the same way: break each record into fields and concatenate the first three to make a hash key; if the hash key already exists, you have to compare field 4 against the existing hash value, and keep or replace the old hash value as appropriate; otherwise, just add the novel key/value set into the hash.

Once you reach the end of the second file, your hash is the complete and correct union, and you just print it.

Based on the code you tried, I'm assuming that you are confident about the distribution of commas in your data -- i.e. that every line of data contains exactly 3 commas (separating the four fields per line). If you really are confident that this is true and will never change, then using split is good enough.

Um, your handling of command line args seemed a bit strange; here's an untested sample of how I would approach the task:

#!/usr/bin/perl

use strict;

my $Usage = "Usage: $0 old_file  new_file  > union_file\n";
die $Usage unless ( @ARGV == 2 and -f $ARGV[0] and -f $ARGV[1] );

my %union;

open IN, $ARGV[0] or die "$ARGV[0]: $!";
while (<IN>) {
    chomp;
    my @flds = split /,/;
    my $val = pop @flds;  # assumes exactly 4 fields in every row
    my $key = join ',', @flds;
    $union{$key} = $val;
}
open IN, $ARGV[1] or die "$ARGV[1]: $!";
while (<IN>) {
    chomp;
    my @flds = split /,/;
    my $val = pop @flds;
    my $key = join ',', @flds;

    next if ( exists( $union{$key} ) and
             abs(($union{$key} - $val)/$union{$key}) * 100 > 1 );

    $union{$key} = $val;
}

# union is now complete

print "$_,$union{$_}\n" for ( sort keys %union );
[download]

(You should probably check to see that the sense of the value comparison is what you intended. It's so easy to invert the logic when you don't mean to.)

(updated to move the close paren for the "abs()" call).

In reply to Re^3: file merge by graff
in thread file merge by nraymond

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.