rruser has asked for the wisdom of the Perl Monks concerning the following question:

I need to be able to match the first field (car number) to the date in both files. Either file my have multiple listing of the date. If no date exists it still gets written out. If multiple dates I really need to match the closest dates to each other. (ex. 04/01/2013 04/05/2013 then 04/06/2013 04/30/2013) I hope this makes sense. Thanks for any help

I can't figure out how to combine my two files to get the desired output.

file 1: AOKX 495408, L, 04/02/13, SWCOMP AOKX 495408, L, 04/20/13, SWCOMP BLHX 102, L, 04/01/13, WILDCOM CRDX 7067, L, 04/05/13, TYCO WW 9030, L, 04/02/13, HALLI file2: AOKX 495408, L, 04/15/13, SWCOMP BLHX 102, L, 04/03/13, WILDCOM BLHX 102, L, 04/30/13, WILDCOM CRDX 7067, L, 04/20/13, TYCO WW 9030, L, 04/30/13, HALLI output (what it needs to look like) $key $le $date $rdate $company or $rcompany AOKX 495408 L 04/02/13 04/15/2013 SWCOMP AOKX 495408 L 04/20/13 SWCOMP BLHX 102 L 04/01/13 04/03/2013 WILDCOM BLHX 102 L 04/30/2013 WILDCOM CRDX 7067 L 04/05/13 04/20/2013 TYCO WW 9030 L 04/02/13 04/30/13 HALLI #!/usr/bin/perl # use strict; use warnings; open FILE1, "<", "out1.txt" or die "$!\n"; my %hash; while ( <FILE1> ) { chomp $_; my ( $key, $le, $date, $company ) = split ',', $_; $hash{$key} = [$le, $date, $company]; # print "$key $date\n"; } close FILE1; open FILE2, "<", "out2.txt" or die "$!\n"; open OUTFILE, ">", "final.txt" or die "$!\n"; while (<FILE2>) { my ( $rkey, $rle, $rdate, $rcompany ) = split ',', $_; $hash{$rkey} = [$rle, $rdate, $rcompany]; # print OUTFILE "\n"; #to write out once figure out how to combine } close FILE2; close OUTFILE;

Replies are listed 'Best First'.
Re: combining 2 files with 4 columns need help
by 2teez (Vicar) on May 29, 2013 at 13:43 UTC

    Hi rruser,
    I can't figure out how to combine my two files to get the desired output.
    Since, you are reading both files the same way there is no need doing the same it over and over again.
    If I may give you a head up, something like so:

    use warnings; use strict; use Data::Dumper; my %collector; foreach my $file (@ARGV) { open my $fh, '<', $file or die $!; while (<$fh>) { chomp; my @vals = split /,/, $_, 4; if ( !exists $collector{ $vals[0] } ) { $collector{ $vals[0] } = [ @vals[ 1 .. 3 ] ]; } else { push @{ $collector{ $vals[0] } }, $vals[2]; } } } print Dumper \%collector;
    Output:
    $VAR1 = { 'CRDX 7067' => [ ' L', ' 04/05/13', ' TYCO', ' 04/20/13' ], 'AOKX 495408' => [ ' L', ' 04/02/13', ' SWCOMP', ' 04/20/13', ' 04/15/13' ], 'BLHX 102' => [ ' L', ' 04/01/13', ' WILDCOM', ' 04/03/13', ' 04/30/13' ], 'WW 9030' => [ ' L', ' 04/02/13', ' HALLI', ' 04/30/13' ] };
    Do the display as you wish, sir... :)

    If you tell me, I'll forget.
    If you show me, I'll remember.
    if you involve me, I'll understand.
    --- Author unknown to me

      thanks so much just the data I need. not sure how to get it into columnar form..so much to learn

        how would i create a regex to read the multiline record? I can't seem figure out how to do this. thanks for the help.

        'CBTX 741274' => [ ' E ', ' 04/07/13', ' EOG ', ' 04/11/13'
Re: combining 2 files with 4 columns need help
by Random_Walk (Prior) on May 29, 2013 at 13:36 UTC

    You may want to read all the data in first then dump it out as needed. I would build a hash with the first key being the car number. The next key I would use company and then under there I would store a reference to an array holding all the dates. I am not sure what $le is but where is goes in the has depends on whether it is related to the car, the company or perhaps the car/company/date event

    If you converted the dates to European format yyyy-mm-dd then you can do a simple sort on them and print out sequential pairs of dates to get the 'close' ones.

    Cheers,
    R.

    Pereant, qui ante nos nostra dixerunt!
Re: combining 2 files with 4 columns need help
by poj (Abbot) on May 29, 2013 at 18:19 UTC
    Here is my best guess at what you mean by closest matched dates. I've assumed all the fields except the date make up the primary key that you want to join on. Also assumed all the dates are on or after 01/01/2000;
    #!/usr/bin/perl # use strict; use warnings; use Date::Calc qw( Delta_Days ); my %hash=(); input_data(1,'out1.txt'); input_data(2,'out2.txt'); output_data('final.txt'); sub input_data { my ($ix,$filename) = @_; open FILE1, "<", $filename or die "$filename : $!\n"; while ( <FILE1> ) { chomp $_; my ( $key, $le, $date, $company ) = split ',', $_; my $pk = join "\t",$key,$le,$company; push @{$hash{$pk}[$ix]},fmt_ymd($date); } close FILE1; } sub output_data { my $filename = shift; open OUTFILE, ">", $filename or die "$filename : $!\n"; # primary key for my $pk (sort keys %hash){ my ($key,$le,$company) = split "\t",$pk; # get multiple dates my @dates = @{$hash{$pk}[1]}; my @rdates = @{$hash{$pk}[2]}; # even up number of dates while (@dates < @rdates) { push @dates,'1900-01-01'; } while (@rdates < @dates) { push @rdates,'1900-01-01'; } # print out multiple dates for each key for my $date (reverse sort @dates){ # use match sub if more than 1 if (@rdates > 1){ @rdates = match($date,@rdates); } # rdates sorted so best match is first element my $rdate = shift @rdates; print OUTFILE join ' ',$key,$le,fmt_mdy($date),fmt_mdy($rdate),$ +company,"\n"; } } close OUTFILE; } # match dates by calc days diff # and sorting to get least diff sub match { my ($date,@rdates) = @_; my @days=(); # split date into y,m,d my @d1 = split /\D/,$date; # calc diff and store with date for my $rdate (@rdates){ my @d2 = split /\D/,$rdate; push @days,[$rdate,abs Delta_Days(@d1,@d2)]; } # sort array by days @days = sort {$a->[1] <=> $b->[1]} @days; # extract dates return map {$_->[0]} @days; } # change mm/dd/yy to yyyy-mm-dd sub fmt_ymd { my $mdy = shift; $mdy =~ s/ //g; my ($m,$d,$y) = split /\D/,$mdy; if ($y < 99){ $y += 2000 }; return sprintf "%04d-%02d-%02d",$y,$m,$d; } # change yyyy-mm-dd to mm/dd/yy sub fmt_mdy { my $ymd = shift; $ymd =~ s/ //g; return ' 'x8 if $ymd eq '1900-01-01'; my ($y,$m,$d) = split /\D/,$ymd; $y -= 2000; return sprintf "%02d/%02d/%02d",$m,$d, $y; }
    poj

      I am getting following message when running script "Can't use an undefined value as an ARRAY reference at script.pl line 34."

      lines 33 & 34 my @dates = @{$hash{$pk}[1]}; my @rdates = @{$hash{$pk}[2]};

      thanks so much for your help

        Looks like you have a primary key in one file only. Add a print to confirm ;
        print "$pk\n"; my @dates = @{$hash{$pk}[1]}; my @rdates = @{$hash{$pk}[2]};
        poj