rruser has asked for the wisdom of the Perl Monks concerning the following question:

I recieved some great help matching data from 2 files, I now need to append a carryover file to file1. When I append data to file1 (carry over data from previous month it contains -C) it does not match up with the car number. I am not sure if it is the formatting in the carryover file. Thanks for your help...

output file: ACFX 28523 L 05/18/13 ABCCO ACFX 28526 L 05/28/13 ABCCO ACFX 44866 L 05/28/13 ABCCO ACFX 28523 L 05/01/13 ABCCO-C ACFX 28526 L 05/01/13 ABCCO-C ACFX 44866 L 05/01/13 ABCCO-C ADMX 49266 L 05/03/13 05/16/13 PFGCO ADMX 63770 L 05/12/13 05/21/13 PFGCO ADMX 63975 L 05/12/13 05/30/13 PFGCO file1: ADMX 49266, L , 05/03/13, PFGCO ADMX 63770, L , 05/12/13, PFGCO ADMX 63975, L , 05/12/13, PFGCO file2: ADMX 49266, L , 05/16/13, PFGCO ACFX 28523, L , 05/18/13, ABCCO ADMX 63770, L , 05/21/13, PFGCO ACFX 28526, L , 05/28/13, ABCCO ACFX 44866, L , 05/28/13, ABCCO ADMX 63975, L , 05/30/13, PFGCO carry-over file (I append to bottom of file1) ACFX 28523, L, 05/01/13, ABCCO-C ACFX 28526, L, 05/01/13, ABCCO-C ACFX 44866, L, 05/01/13, ABCCO-C I need the output file to read like this: ACFX 28523 L 05/01/13 05/18/13 ABCCO-C ACFX 28526 L 05/01/13 05/28/13 ABCCO-C ACFX 44866 L 05/01/13 05/28/13 ABCCO-C ADMX 49266 L 05/03/13 05/16/13 PFGCO ADMX 63770 L 05/12/13 05/21/13 PFGCO ADMX 63975 L 05/12/13 05/30/13 PFGCO #!/usr/bin/perl # use strict; use warnings; use Date::Calc qw( Delta_Days ); my %hash=(); input_data(1,'out1.txt'); input_data(2,'out2.txt'); #input_data(1,'pcarry.txt'); #input_data(2,'rcarry.txt'); output_data('final.txt'); sub input_data { my ($ix,$filename) = @_; open FILE1, "<", $filename or die "$filename : $!\n"; while ( <FILE1> ) { chomp $_; my ( $key, $le, $date, $company ) = split ',', $_; my $pk = join "\t",$key,$le,$company; push @{$hash{$pk}[$ix]},fmt_ymd($date); } close FILE1; } sub output_data { my $filename = shift; open OUTFILE, ">", $filename or die "$filename : $!\n"; # primary key for my $pk (sort keys %hash){ my ($key,$le,$company) = split "\t",$pk; # get multiple dates # print "$pk\n"; # my @dates = @{$hash{$pk}[1]}; # my @rdates = @{$hash{$pk}[2]}; my @dates = (defined $hash{$pk}[1]) ? @{$hash{$pk}[1]} : (); my @rdates = (defined $hash{$pk}[2]) ? @{$hash{$pk}[2]} : (); # even up number of dates: while (@dates < @rdates) { push @dates,'1900-01-01'; } while (@rdates < @dates) { push @rdates,'1900-01-01'; } # print out multiple dates for each key for my $date (reverse sort @dates){ # use match sub if more than 1 if (@rdates > 1){ @rdates = match($date,@rdates); } # rdates sorted so best match is first element my $rdate = shift @rdates; print join ' ',$key,$le,fmt_mdy($date),fmt_mdy($rdate),$company, +"\n"; } } close OUTFILE; } # match dates by calc days diff # and sorting to get least diff sub match { my ($date,@rdates) = @_; my @days=(); # split date into y,m,d my @d1 = split /\D/,$date; # calc diff and store with date for my $rdate (@rdates){ my @d2 = split /\D/,$rdate; push @days,[$rdate,abs Delta_Days(@d1,@d2)]; } # sort array by days @days = sort {$a->[1] <=> $b->[1]} @days; # extract dates return map {$_->[0]} @days; }

Replies are listed 'Best First'.
Re: issue with output of file matching
by poj (Abbot) on Jun 07, 2013 at 19:44 UTC
    If you want for example ABCCO to match ABCCO-C then you must remove the -C from the matching key and then add it back into the report after the match. You need to ensure there are no trailing spaces in the data after the -C or modify the regex to deal with them.
    # Define a hash for pk that have -C removed my %hashC=(); . . sub input_data { my ($ix,$filename) = @_; open FILE1, "<", $filename or die "$filename : $!\n"; while ( <FILE1> ) { chomp $_; my ( $key, $le, $date, $company ) = split ',', $_; my $pk = join "\t",$key,$le,$company; # remove -C from key and store if ($pk =~ s/-C$//){ # print "-C removed $pk\n"; $hashC{$pk} = '-C'; } push @{$hash{$pk}[$ix]},fmt_ymd($date); } close FILE1; } sub output_data { my $filename = shift; open OUTFILE, ">", $filename or die "$filename : $!\n"; # primary key for my $pk (sort keys %hash){ my ($key,$le,$company) = split "\t",$pk; # add -C back if required $company .= $hashC{$pk} || ''; . . .
    Note : The spacing of the columns in the carry over file example is different from file 1 and 2 example. Only one space shown between column1 and 2 and no space after L. If the real data is like this you may have more problems.
    ACFX 28523, L, 05/01/13, ABCCO-C
    poj

      Thanks so much for the help again, I replace the sub input_data and sub output_data in original script with this and I get the following:

      ACFX 28523 L 05/18/13 ABCCO ACFX 28523 L 05/01/13 ABCCO-C

      I am pasting entire script, maybe I am doing something wrong, this is a little over my head, hopefully will begin understanding better

      #!/usr/bin/perl # use strict; use warnings; use Date::Calc qw( Delta_Days ); my %hashC=(); my %hash=(); input_data(1,'out1.txt'); input_data(2,'out2.txt'); #input_data(1,'pcarry.txt'); #input_data(2,'rcarry.txt'); output_data('final.txt'); #sub input_data { # my ($ix,$filename) = @_; # open FILE1, "<", $filename or die "$filename : $!\n"; # while ( <FILE1> ) { # chomp $_; # my ( $key, $le, $date, $company ) = split ',', $_; # my $pk = join "\t",$key,$le,$company; # push @{$hash{$pk}[$ix]},fmt_ymd($date); # } # close FILE1; #} sub input_data { my ($ix,$filename) = @_; open FILE1, "<", $filename or die "$filename : $!\n"; while ( <FILE1> ) { chomp $_; my ( $key, $le, $date, $company ) = split ',', $_; my $pk = join "\t",$key,$le,$company; # remove -C from key and store if ($pk =~ s/-C$//){ # print "-C removed $pk\n"; $hashC{$pk} = '-C'; } push @{$hash{$pk}[$ix]},fmt_ymd($date); } close FILE1; } #sub output_data { # my $filename = shift; # open OUTFILE, ">", $filename or die "$filename : $!\n"; # primary key # for my $pk (sort keys %hash){ # my ($key,$le,$company) = split "\t",$pk; sub output_data { my $filename = shift; open OUTFILE, ">", $filename or die "$filename : $!\n"; # primary key for my $pk (sort keys %hash){ my ($key,$le,$company) = split "\t",$pk; # add -C back if required $company .= $hashC{$pk} || ''; # get multiple dates # print "$pk\n"; # my @dates = @{$hash{$pk}[1]}; # my @rdates = @{$hash{$pk}[2]}; my @dates = (defined $hash{$pk}[1]) ? @{$hash{$pk}[1]} : (); my @rdates = (defined $hash{$pk}[2]) ? @{$hash{$pk}[2]} : (); # even up number of dates: while (@dates < @rdates) { push @dates,'1900-01-01'; } while (@rdates < @dates) { push @rdates,'1900-01-01'; } # print out multiple dates for each key for my $date (reverse sort @dates){ # use match sub if more than 1 if (@rdates > 1){ @rdates = match($date,@rdates); } # rdates sorted so best match is first element my $rdate = shift @rdates; print join ' ',$key,$le,fmt_mdy($date),fmt_mdy($rdate),$company, +"\n"; } } close OUTFILE; } # match dates by calc days diff # and sorting to get least diff sub match { my ($date,@rdates) = @_; my @days=(); # split date into y,m,d my @d1 = split /\D/,$date; # calc diff and store with date for my $rdate (@rdates){ my @d2 = split /\D/,$rdate; push @days,[$rdate,abs Delta_Days(@d1,@d2)]; } # sort array by days @days = sort {$a->[1] <=> $b->[1]} @days; # extract dates return map {$_->[0]} @days; } # change mm/dd/yy to yyyy-mm-dd sub fmt_ymd { my $mdy = shift; $mdy =~ s/ //g; my ($m,$d,$y) = split /\D/,$mdy; if ($y < 99){ $y += 2000 }; return sprintf "%04d-%02d-%02d",$y,$m,$d; } # change yyyy-mm-dd to mm/dd/yy sub fmt_mdy { my $ymd = shift; $ymd =~ s/ //g; return ' 'x8 if $ymd eq '1900-01-01'; my ($y,$m,$d) = split /\D/,$ymd; $y -= 2000; return sprintf "%02d/%02d/%02d",$m,$d, $y; }
        I suggest you comment out this line temporarily
        $company .= $hashC{$pk} || '';.

        If you still get -C appearing in the output then check the data carefully for trailing spaces. You can also add the primary key into the output temporarily with separators like this to see spaces or other reason why they don't match up.

        print join ' ',$key,$le,fmt_mdy($date),fmt_mdy($rdate),$company,"|$pk +|\n";
        poj
Re: issue with output of file matching
by JockoHelios (Scribe) on Jun 07, 2013 at 19:29 UTC
    In describing the issue, you mention that the carryover file needs to be appended to file 1. File 1 data, then carryover data added to it.

    In the example of what you need the output to look like, the carryover data is prepended to file 1. Carryover data, then file 1 data added to it.

    Is this part of the problem you're looking to solve, or is it a copy-and-paste error when composing the post ?
    Dyslexics Untie !!!

      I did figure out the formatting issue appending from carryover file to file1. So that seems to have helped some. I still do not match up because merging of the two files requires the company name be identical - however I must know which records were from the carryover file in the final output...any help is appreciated, thanks