in reply to Re: file merge
in thread file merge

Dave, Thank you for your response. I'm new to perl. I guess I need more than a nudge in the right direction. I tried to reuse a code example for comparing files to no avail.
handleArgs(); if($outfile) { open (STDOUT, ">$outfile") || die "Cannot redirect STDOUT\n"; } open(TEMP,$newFile) || die "Error opening $newFile\n"; open(FILE,$oldFile) || die "Error opening $oldFile\n"; while(<TEMP>) { push(@temp, $_); } while(<FILE>) { push(@file, $_); } %temp = sort(@temp); %file = sort(@file); OUTER: foreach $_ ($file) { next if /^\s*$/; #ignore empty lines next if /^\s*#/; #ignore comments chomp; # $_ = lc; while(%temp) { $x = shift %temp; next if ($x =~ /^\s*$/); next if ($x =~ /^\s*#/); chomp $x; @compX = split /,/, $x; @compY = split /,/, $_; $comparison = lc($compY[0..2]) cmp lc($compX[0..2]); print "$_ - $x = $comparison\n" if $DEBUG; if($comparison == 0) { print "Record match...checking pctChange \n"; $fpPctChange = abs(($compX[3] - $compY[3]) / $compX[3] * 1 +00); if ($fpPctChange > 1) { print " BUT pct. Change greater than one.for $_ +. Please Investigate.\n"; print "Here's what was Found $_ AND $x\n"; } next OUTER; } else { print "this record is not a match. $_ , $x\n" if $DEBUG; next OUTER; } } print " This oldFile record was not found in newFile file! $_\n"; } if($W) { foreach $temp (@temp) { print "Warning $temp not expected\n";} } sub handleArgs() { while(@ARGV) { print "In Handle args\n" if $DEBUG; #Enable warnings if($ARGV[0] =~ /-W/i) { $W = 1; shift(@ARGV); } elsif($ARGV[0] =~ /-t/i) { $oldFile = $ARGV[1]; splice(@ARGV, 0, 2); } elsif($ARGV[0] =~ /-d/i) { $newFile = $ARGV[1]; splice(@ARGV, 0, 2); } elsif($ARGV[0] =~ /-O/i) { $outfile = $ARGV[1]; splice(@ARGV, 0, 2); } else { print "Invalid Arg in FileCmp ", shift (@ARGV),"\n"; } } }

Replies are listed 'Best First'.
Re^3: file merge
by graff (Chancellor) on Apr 05, 2005 at 23:02 UTC
    Well, it's really good that you gave that a try. The only thing that was glaringly bad: don't assign the output of "sort" to a hash (first, third, etc become keys, second, fourth, etc become values), and don't ever use "shift" on a hash. You better study some more about hashes vs. arrays.

    But you really need to rethink the algorithm. Since you are creating a union of two sets of records, where some keys might be present in both sets, you want to build the union in a single hash, then when that's done, print the contents of the hash.

    Contrary to davido's advice, I would read the old file into the hash first. Use the concatenation of the first three fields as the hash key (i.e.  $key = join ",", @fields[0..2]; then use the fourth field as the hash value. (Are there more than four fields per line? If so, the hash value can be an array.)

    Then read the new file in the same way: break each record into fields and concatenate the first three to make a hash key; if the hash key already exists, you have to compare field 4 against the existing hash value, and keep or replace the old hash value as appropriate; otherwise, just add the novel key/value set into the hash.

    Once you reach the end of the second file, your hash is the complete and correct union, and you just print it.

    Based on the code you tried, I'm assuming that you are confident about the distribution of commas in your data -- i.e. that every line of data contains exactly 3 commas (separating the four fields per line). If you really are confident that this is true and will never change, then using split is good enough.

    Um, your handling of command line args seemed a bit strange; here's an untested sample of how I would approach the task:

    #!/usr/bin/perl use strict; my $Usage = "Usage: $0 old_file new_file > union_file\n"; die $Usage unless ( @ARGV == 2 and -f $ARGV[0] and -f $ARGV[1] ); my %union; open IN, $ARGV[0] or die "$ARGV[0]: $!"; while (<IN>) { chomp; my @flds = split /,/; my $val = pop @flds; # assumes exactly 4 fields in every row my $key = join ',', @flds; $union{$key} = $val; } open IN, $ARGV[1] or die "$ARGV[1]: $!"; while (<IN>) { chomp; my @flds = split /,/; my $val = pop @flds; my $key = join ',', @flds; next if ( exists( $union{$key} ) and abs(($union{$key} - $val)/$union{$key}) * 100 > 1 ); $union{$key} = $val; } # union is now complete print "$_,$union{$_}\n" for ( sort keys %union );
    (You should probably check to see that the sense of the value comparison is what you intended. It's so easy to invert the logic when you don't mean to.)

    (updated to move the close paren for the "abs()" call).

      Thank you very much. You have been very helpful. I went out this weekend and purchased the camel and llama books. so the journey begins...