in reply to Re: issue with output of file matching
in thread issue with output of file matching

Thanks so much for the help again, I replace the sub input_data and sub output_data in original script with this and I get the following:

ACFX 28523 L 05/18/13 ABCCO ACFX 28523 L 05/01/13 ABCCO-C

I am pasting entire script, maybe I am doing something wrong, this is a little over my head, hopefully will begin understanding better

#!/usr/bin/perl # use strict; use warnings; use Date::Calc qw( Delta_Days ); my %hashC=(); my %hash=(); input_data(1,'out1.txt'); input_data(2,'out2.txt'); #input_data(1,'pcarry.txt'); #input_data(2,'rcarry.txt'); output_data('final.txt'); #sub input_data { # my ($ix,$filename) = @_; # open FILE1, "<", $filename or die "$filename : $!\n"; # while ( <FILE1> ) { # chomp $_; # my ( $key, $le, $date, $company ) = split ',', $_; # my $pk = join "\t",$key,$le,$company; # push @{$hash{$pk}[$ix]},fmt_ymd($date); # } # close FILE1; #} sub input_data { my ($ix,$filename) = @_; open FILE1, "<", $filename or die "$filename : $!\n"; while ( <FILE1> ) { chomp $_; my ( $key, $le, $date, $company ) = split ',', $_; my $pk = join "\t",$key,$le,$company; # remove -C from key and store if ($pk =~ s/-C$//){ # print "-C removed $pk\n"; $hashC{$pk} = '-C'; } push @{$hash{$pk}[$ix]},fmt_ymd($date); } close FILE1; } #sub output_data { # my $filename = shift; # open OUTFILE, ">", $filename or die "$filename : $!\n"; # primary key # for my $pk (sort keys %hash){ # my ($key,$le,$company) = split "\t",$pk; sub output_data { my $filename = shift; open OUTFILE, ">", $filename or die "$filename : $!\n"; # primary key for my $pk (sort keys %hash){ my ($key,$le,$company) = split "\t",$pk; # add -C back if required $company .= $hashC{$pk} || ''; # get multiple dates # print "$pk\n"; # my @dates = @{$hash{$pk}[1]}; # my @rdates = @{$hash{$pk}[2]}; my @dates = (defined $hash{$pk}[1]) ? @{$hash{$pk}[1]} : (); my @rdates = (defined $hash{$pk}[2]) ? @{$hash{$pk}[2]} : (); # even up number of dates: while (@dates < @rdates) { push @dates,'1900-01-01'; } while (@rdates < @dates) { push @rdates,'1900-01-01'; } # print out multiple dates for each key for my $date (reverse sort @dates){ # use match sub if more than 1 if (@rdates > 1){ @rdates = match($date,@rdates); } # rdates sorted so best match is first element my $rdate = shift @rdates; print join ' ',$key,$le,fmt_mdy($date),fmt_mdy($rdate),$company, +"\n"; } } close OUTFILE; } # match dates by calc days diff # and sorting to get least diff sub match { my ($date,@rdates) = @_; my @days=(); # split date into y,m,d my @d1 = split /\D/,$date; # calc diff and store with date for my $rdate (@rdates){ my @d2 = split /\D/,$rdate; push @days,[$rdate,abs Delta_Days(@d1,@d2)]; } # sort array by days @days = sort {$a->[1] <=> $b->[1]} @days; # extract dates return map {$_->[0]} @days; } # change mm/dd/yy to yyyy-mm-dd sub fmt_ymd { my $mdy = shift; $mdy =~ s/ //g; my ($m,$d,$y) = split /\D/,$mdy; if ($y < 99){ $y += 2000 }; return sprintf "%04d-%02d-%02d",$y,$m,$d; } # change yyyy-mm-dd to mm/dd/yy sub fmt_mdy { my $ymd = shift; $ymd =~ s/ //g; return ' 'x8 if $ymd eq '1900-01-01'; my ($y,$m,$d) = split /\D/,$ymd; $y -= 2000; return sprintf "%02d/%02d/%02d",$m,$d, $y; }

Replies are listed 'Best First'.
Re^3: issue with output of file matching
by poj (Abbot) on Jun 07, 2013 at 20:30 UTC
    I suggest you comment out this line temporarily
    $company .= $hashC{$pk} || '';.

    If you still get -C appearing in the output then check the data carefully for trailing spaces. You can also add the primary key into the output temporarily with separators like this to see spaces or other reason why they don't match up.

    print join ' ',$key,$le,fmt_mdy($date),fmt_mdy($rdate),$company,"|$pk +|\n";
    poj

      thanks for that code I was able to fix my formatting and space issues so the appended file is the same field widths. I am still not getting the fields to match up.

      ACFX 28523 L 05/18/13 ABCCO ACFX 28523 L 05/01/13 ABCCO-C ACFX 28526 L 05/28/13 ABCCO ACFX 28526 L 05/01/13 ABCCO-C ACFX 44866 L 05/28/13 ABCCO ACFX 44866 L 05/01/13 ABCCO-C ADMX 49266 L 05/03/13 05/16/13 PFGCO ADMX 63770 L 05/12/13 05/21/13 PFGCO ADMX 63975 L 05/12/13 05/30/13 PFGCO
      the first and second row need to match (they do perfectly without the +-C) need them to look like: ACFX 44866 L 05/01/13 05/28/13 ABCCO-C

      I made sure file1 and file2 are both the same column widths, so now i just think the issue is with the company names not being exact...thanks

        What output do you get with $pk added and the separator changed to | like this ? ;
        print join '|',$pk,$key,$le,fmt_mdy($date),fmt_mdy($rdate),$company,"\ +n";
        poj