Dear Monks,
Greetings. I'm having 10 input files in CSV format. I've written a method that
will return the data as a hash in a pipe delimeted format, so for 10 csv files I have 10 hashes with data populated.
Below is the format of all the 10 csv file (well the data may vary):
# 69 CHANGES DAILY_FILE_EM
+ 20060601
# 1 Status status
+S 20 0
# 2 Event ID event_id
+N 10 0
# 3 Effective Date ex_date
+D 8 0
# 4 Last updated date last_updated_date
+D 8 0
# 5 First entry date first_entry_date
+D 8 0
# 6 Current country current_country
+S 2 0
# 7 New country new_country
+S 2 0
# 8 Current security name current_sec_name
+S 24 0
# 9 New security name new_sec_name
+S 24 0
# 10 RR Security code rr_sec_code S
+ 8 0
AAB>>>>>>>>>>>AAA>>>>>>>>>AAB>>>>>>>AAA>>>>>>>AAB>>>>>>>AAA>AAA>AAA>>>
+>>>>>>>>>AAB>>>>>>>>>AAA>>>>>>>>>>>
CONFIRMED 13336 20060602 20060601 20060601 TR T
+R AYGAZ AYGAZ 15675
+.01
CONFIRMED 12995 20060601 20060511 20060511 KR K
+R DAEWOO CO DAEWOO CO 15
+216.01
CONFIRMED 12995 20060601 20060511 20060511 KR K
+R WOORI WOORI
+ 15262.01
CONFIRMED 12995 20060601 20060511 20060511 JO J
+O JORDAN SEC JORDAN SEC 22
+318.01
CONFIRMED 12995 20060601 20060511 20060511 J
+O CAIRO FIN 15178
+.01
CONFIRMED 12995 20060601 20060511 20060511 JO J
+O ILFS ILFS 15
+177.01
CONFIRMED 12995 20060601 20060511 20060511 JO J
+O PP PETROL. REFINERY PP PETROL. REFINERY 15194
+.01
CONFIRMED 12995 20060601 20060511 20060511 MX M
+X WALMART WALMART 15306.
+04
CONFIRMED 12995 20060601 20060511 20060511 MX
+ VITRO A 15333.
+01
EXPECTED 13266 20060612 20060526 20060526 TR
+ TR TURKCELL TURKCELL
+23769.01
#EOD
*
I'm parsing the csv files using the following code:
use strict;
use warnings;
use Data::Dumper;
my $file_ss = "/tmp/ONE_ACE.csv";
my $file_sc = "/tmp/TWO_ACE.csv";
my $data_ss = get_csv_data($file_ss);
my $data_sc = get_csv_data($file_sc);
print Dumper($data_ss); #data of the first csv file
my $out_file="/tmp/aggregate.csv";
my $output_header="FileName,Status,Event ID,RR Security
Code";
my $output_body = "DATA returned from the the matching eventids/rr_sec
+urity_code";TODO
sub get_csv_data {
my $open_file = shift;
my ($curr_country_currency, $new_country_currency, );
#Actual data is after line "AAB>>>>"
my %data=();
#local $/=">>>\n";
if (!-f"$open_file") {
report("Abort","Could not find $open_file file");
exit 1;
}
open(DATA,"$open_file");
while(<DATA>) {
#Actual data after the line SSL>>>>>SSV>>>>>
if(/AAB>>>>/ .. /#EOD/){
chomp;
s///g;
s/^\s+//g;
next if /^$/;
next if /AAA>>/;
next if /AAB>>/;
my($status, $event_id, $effdate, $last_updated, $entry_da
+te, $curr_country, $new_country, $curr_sec_name,
$new_sec_name, $msci_sec_code) = split(/\|/);
$data{$open_file} = "$open_file|$status|$event_id|$ms
+ci_sec_code";
}
} #while
close DATA;
return (%data);
}
Now the problem part. We will allow repeated "Event ID" that are from the same file, but do not include the
data of a row if it is a repeated "Event Id" and "RR Secutiry Code" from a different csv file, instead
I need to concatenate its FileName to the oldFileName, i.e for example:
File A ('a') and File B ('b') - File A's data has already been scanned
Pseudo code:
If a.Event ID = b.Event ID {
If a.FileName = b.FileName {
If a.RR Security code = a.RR Security code
FileName = FileName + '/' + b.FileName
(e.g. if (a)FileName = 'ONE_ACE' and b.FileName = 'TWO_ACE', the
+n FileName = 'ONE_ACE/TWO_ACE')
Return nothing;
Else
Return data of row to be included in output file;
}
Else
Return data of row to be included in output file;
}
Else
Return data of row to be included in output file;
I need some suggestions/pseudo code to implement the same.
Thanks in advance.