Hello Monks, I'm a perl newbie and have the following requirement: 1) I need to do an outer join on 2 files which have about 20 columns each. Both files have headers
2) The join needs to happen on key based on 3 columns from each file, so both have to be sorted. The key is from column 1,4,5 in both files but key column headers don't all match (can we match the key based on column index instead of header names?)
3) If the key matches, I need the flexibility to add specific columns from any of the files to the output file.
4) If there is no key match, take the existing key/data from file and add it to the output leaving the other columns blank (outer join).
5) Need to generate a separate output file leaving inputs intact.
6) The input/output files need to use '|' separators
Here is an example with 2 input files and an output file using only small sample of columns:
File_Deal - the key here is parent_cusp,deal,tranche
File_ATT - the key here is Vendor, deal, trancheparent_cusp|cusp|isin|deal|tranche|det_date|col_type 38375U|36182D|36182D1|HMAG|HMBSWEE|20150416|mortgage 383333|361333|3618333|HABS|HABSDDE|20150330|mortgage2
File_OutputVendor|visp|barnembly|deal|tranche|Fund|subamt|colamt|basamt 38375U|3DD82D|36FF333|HMAG|HMBSWEE|9010|765423|364633|46566 38EE33|361DD3|36LLE33|H99S|HAOOODE|2330|377233|347433|34488
parent_cusp|cusp|isin|deal|tranche|det_date|col_type|Fund|subamt|colam +t|basamt 38375U|36182D|36182D1|HMAG|HMBSWEE|20150416|mortgage|9010|765423|36463 +3|46566 383333|361333|3618333|HABS|HABSDDE|20150330|mortgage2|||| 38EE33|||H99S|HAOOODE|||2330|377233|347433|34488
7) The output needs to contain all the columns/data from file1 (File_Deal) and for file2 (File_ATT) join only starting from column 6 (Fund) until the last column (basamt).
8) based on the output - row 1 is a match so I just join from both files
9) row 2 is a mismatch from File_deal, but since it's outer join I just copy row 2 from file_deal and just add blanks (since it's missing from file_Att
10) row 2 is a mismatch from File_att but once again I need outer join. But here I need to copy the key from file_detail to output as well, so I just write 3 key columns into 1st, 4th, and 5th column(leave others blank), then attach data from File_ATT
Here is some of the code I'm starting with from sample I found but need to come up with a solution quickly as deadline is approaching, can you please help. Thanks in advance
#! /bin/env perl my $File_Deal = $ARGV[0]; my $File_ATT = $ARGV[1]; open(F1, "<", $File_Deal); open(F2, "<", $File_ATT); my %hash = (); while( <F1> ) { chomp; my($c, $c2, $c4, @val1) = split/,/, $_, -1; $hash{$c1.$c2.$c4}[0] = $val1[0]; $hash{$key}[1] = $val1[1]; $hash{$key}[2] = $val1[2]; } while( <F2> ) { chomp; my($c1,$c5, $c7, @val2) = split/,/, $_, -1; $hash{$c1.$c5.$7}[3] = $val2[0]; $hash{$key}[4] = $val2[1]; $hash{$key}[5] = $val2[2]; } for my $key (sort keys %hash) { print "$key: $hash{$key}[0]:$hash{$key}[1]\n"; }
In reply to Outer Join on 2 files by healingtao
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |