combining 2 files with 4 columns need help

rruser has asked for the wisdom of the Perl Monks concerning the following question:

I need to be able to match the first field (car number) to the date in both files. Either file my have multiple listing of the date. If no date exists it still gets written out. If multiple dates I really need to match the closest dates to each other. (ex. 04/01/2013 04/05/2013 then 04/06/2013 04/30/2013) I hope this makes sense. Thanks for any help

I can't figure out how to combine my two files to get the desired output.

file 1:
AOKX 495408, L,  04/02/13, SWCOMP
AOKX 495408, L,  04/20/13, SWCOMP
BLHX    102, L,  04/01/13, WILDCOM
CRDX   7067, L,  04/05/13, TYCO
WW     9030, L,  04/02/13, HALLI

file2:
AOKX 495408, L,  04/15/13, SWCOMP
BLHX    102, L,  04/03/13, WILDCOM
BLHX    102, L,  04/30/13, WILDCOM
CRDX   7067, L,  04/20/13, TYCO
WW     9030, L,  04/30/13, HALLI

output (what it needs to look like)
$key        $le $date   $rdate       $company or $rcompany
AOKX 495408 L  04/02/13 04/15/2013 SWCOMP
AOKX 495408 L  04/20/13            SWCOMP
BLHX    102 L  04/01/13 04/03/2013 WILDCOM
BLHX    102 L           04/30/2013 WILDCOM
CRDX   7067 L  04/05/13 04/20/2013 TYCO
WW     9030 L  04/02/13 04/30/13   HALLI

#!/usr/bin/perl
# 
use strict;
use warnings;

open FILE1, "<", "out1.txt" or die "$!\n";

my %hash;

 while ( <FILE1> ) {
     chomp $_;
     my ( $key, $le, $date, $company ) = split ',', $_;
     $hash{$key} = [$le, $date, $company];
#    print "$key $date\n"; 
 }

 close FILE1;

 open FILE2, "<", "out2.txt" or die "$!\n";
 open OUTFILE, ">", "final.txt" or die "$!\n";

 while (<FILE2>) {
     my ( $rkey, $rle, $rdate, $rcompany ) = split ',', $_;
     $hash{$rkey} = [$rle, $rdate, $rcompany];
 #   print OUTFILE "\n"; #to write out once figure out how to combine

 }

 close FILE2;
 close OUTFILE;
[download]

Comment on combining 2 files with 4 columns need help Download Code

Replies are listed 'Best First'.
Re: combining 2 files with 4 columns need help by 2teez (Vicar) on May 29, 2013 at 13:43 UTC
Hi rruser, I can't figure out how to combine my two files to get the desired output. Since, you are reading both files the same way there is no need doing the same it over and over again. If I may give you a head up, something like so: `use warnings; use strict; use Data::Dumper; my %collector; foreach my $file (@ARGV) { open my $fh, '<', $file or die $!; while (<$fh>) { chomp; my @vals = split /,/, $_, 4; if ( !exists $collector{ $vals[0] } ) { $collector{ $vals[0] } = [ @vals[ 1 .. 3 ] ]; } else { push @{ $collector{ $vals[0] } }, $vals[2]; } } } print Dumper \%collector;` [download] Output: `$VAR1 = { 'CRDX 7067' => [ ' L', ' 04/05/13', ' TYCO', ' 04/20/13' ], 'AOKX 495408' => [ ' L', ' 04/02/13', ' SWCOMP', ' 04/20/13', ' 04/15/13' ], 'BLHX 102' => [ ' L', ' 04/01/13', ' WILDCOM', ' 04/03/13', ' 04/30/13' ], 'WW 9030' => [ ' L', ' 04/02/13', ' HALLI', ' 04/30/13' ] };` [download] Do the display as you wish, sir... :) If you tell me, I'll forget. If you show me, I'll remember. if you involve me, I'll understand. --- Author unknown to me	[reply] [d/l] [select]
Re^2: combining 2 files with 4 columns need help by rruser (Acolyte) on May 29, 2013 at 16:52 UTC
thanks so much just the data I need. not sure how to get it into columnar form..so much to learn	[reply]
Re^3: combining 2 files with 4 columns need help by rruser (Acolyte) on May 29, 2013 at 18:23 UTC
how would i create a regex to read the multiline record? I can't seem figure out how to do this. thanks for the help. `'CBTX 741274' => [ ' E ', ' 04/07/13', ' EOG ', ' 04/11/13'` [download]	[reply] [d/l]
Re: combining 2 files with 4 columns need help by Random_Walk (Prior) on May 29, 2013 at 13:36 UTC
You may want to read all the data in first then dump it out as needed. I would build a hash with the first key being the car number. The next key I would use company and then under there I would store a reference to an array holding all the dates. I am not sure what `$le` is but where is goes in the has depends on whether it is related to the car, the company or perhaps the car/company/date event If you converted the dates to European format yyyy-mm-dd then you can do a simple sort on them and print out sequential pairs of dates to get the 'close' ones. Cheers, R. Pereant, qui ante nos nostra dixerunt!	[reply] [d/l] [select]
Re: combining 2 files with 4 columns need help by poj (Abbot) on May 29, 2013 at 18:19 UTC
Here is my best guess at what you mean by closest matched dates. I've assumed all the fields except the date make up the primary key that you want to join on. Also assumed all the dates are on or after 01/01/2000; #!/usr/bin/perl # use strict; use warnings; use Date::Calc qw( Delta_Days ); my %hash=(); input_data(1,'out1.txt'); input_data(2,'out2.txt'); output_data('final.txt'); sub input_data { my ($ix,$filename) = @_; open FILE1, "<", $filename or die "$filename : $!\n"; while ( <FILE1> ) { chomp $_; my ( $key, $le, $date, $company ) = split ',', $_; my $pk = join "\t",$key,$le,$company; push @{$hash{$pk}[$ix]},fmt_ymd($date); } close FILE1; } sub output_data { my $filename = shift; open OUTFILE, ">", $filename or die "$filename : $!\n"; # primary key for my $pk (sort keys %hash){ my ($key,$le,$company) = split "\t",$pk; # get multiple dates my @dates = @{$hash{$pk}[1]}; my @rdates = @{$hash{$pk}[2]}; # even up number of dates while (@dates < @rdates) { push @dates,'1900-01-01'; } while (@rdates < @dates) { push @rdates,'1900-01-01'; } # print out multiple dates for each key for my $date (reverse sort @dates){ # use match sub if more than 1 if (@rdates > 1){ @rdates = match($date,@rdates); } # rdates sorted so best match is first element my $rdate = shift @rdates; print OUTFILE join ' ',$key,$le,fmt_mdy($date),fmt_mdy($rdate),$ +company,"\n"; } } close OUTFILE; } # match dates by calc days diff # and sorting to get least diff sub match { my ($date,@rdates) = @_; my @days=(); # split date into y,m,d my @d1 = split /\D/,$date; # calc diff and store with date for my $rdate (@rdates){ my @d2 = split /\D/,$rdate; push @days,[$rdate,abs Delta_Days(@d1,@d2)]; } # sort array by days @days = sort {$a->[1] <=> $b->[1]} @days; # extract dates return map {$_->[0]} @days; } # change mm/dd/yy to yyyy-mm-dd sub fmt_ymd { my $mdy = shift; $mdy =~ s/ //g; my ($m,$d,$y) = split /\D/,$mdy; if ($y < 99){ $y += 2000 }; return sprintf "%04d-%02d-%02d",$y,$m,$d; } # change yyyy-mm-dd to mm/dd/yy sub fmt_mdy { my $ymd = shift; $ymd =~ s/ //g; return ' 'x8 if $ymd eq '1900-01-01'; my ($y,$m,$d) = split /\D/,$ymd; $y -= 2000; return sprintf "%02d/%02d/%02d",$m,$d, $y; } [download] poj	[reply] [d/l]
Re^2: combining 2 files with 4 columns need help by rruser (Acolyte) on May 29, 2013 at 19:18 UTC
I am getting following message when running script "Can't use an undefined value as an ARRAY reference at script.pl line 34." `lines 33 & 34 my @dates = @{$hash{$pk}[1]}; my @rdates = @{$hash{$pk}[2]};` [download] thanks so much for your help	[reply] [d/l]
Re^3: combining 2 files with 4 columns need help by poj (Abbot) on May 29, 2013 at 19:24 UTC
Looks like you have a primary key in one file only. Add a print to confirm ; `print "$pk\n"; my @dates = @{$hash{$pk}[1]}; my @rdates = @{$hash{$pk}[2]};` [download] poj	[reply] [d/l]
Re^4: combining 2 files with 4 columns need help by rruser (Acolyte) on May 29, 2013 at 19:37 UTC
Re^5: combining 2 files with 4 columns need help by poj (Abbot) on May 29, 2013 at 19:39 UTC
Some notes below your chosen depth have not been shown here