Beefy Boxes and Bandwidth Generously Provided by pair Networks
Your skill will accomplish
what the force of many cannot
 
PerlMonks  

Matching and concatenating similar Eventids

by chanakya (Friar)
on Jun 02, 2006 at 14:33 UTC ( [id://553303]=perlquestion: print w/replies, xml ) Need Help??

chanakya has asked for the wisdom of the Perl Monks concerning the following question:

Dear Monks,

Greetings. I'm having 10 input files in CSV format. I've written a method that
will return the data as a hash in a pipe delimeted format, so for 10 csv files I have 10 hashes with data populated.

Below is the format of all the 10 csv file (well the data may vary):
# 69 CHANGES DAILY_FILE_EM + 20060601 # 1 Status status +S 20 0 # 2 Event ID event_id +N 10 0 # 3 Effective Date ex_date +D 8 0 # 4 Last updated date last_updated_date +D 8 0 # 5 First entry date first_entry_date +D 8 0 # 6 Current country current_country +S 2 0 # 7 New country new_country +S 2 0 # 8 Current security name current_sec_name +S 24 0 # 9 New security name new_sec_name +S 24 0 # 10 RR Security code rr_sec_code S + 8 0 AAB>>>>>>>>>>>AAA>>>>>>>>>AAB>>>>>>>AAA>>>>>>>AAB>>>>>>>AAA>AAA>AAA>>> +>>>>>>>>>AAB>>>>>>>>>AAA>>>>>>>>>>> CONFIRMED 13336 20060602 20060601 20060601 TR T +R AYGAZ AYGAZ 15675 +.01 CONFIRMED 12995 20060601 20060511 20060511 KR K +R DAEWOO CO DAEWOO CO 15 +216.01 CONFIRMED 12995 20060601 20060511 20060511 KR K +R WOORI WOORI + 15262.01 CONFIRMED 12995 20060601 20060511 20060511 JO J +O JORDAN SEC JORDAN SEC 22 +318.01 CONFIRMED 12995 20060601 20060511 20060511 J +O CAIRO FIN 15178 +.01 CONFIRMED 12995 20060601 20060511 20060511 JO J +O ILFS ILFS 15 +177.01 CONFIRMED 12995 20060601 20060511 20060511 JO J +O PP PETROL. REFINERY PP PETROL. REFINERY 15194 +.01 CONFIRMED 12995 20060601 20060511 20060511 MX M +X WALMART WALMART 15306. +04 CONFIRMED 12995 20060601 20060511 20060511 MX + VITRO A 15333. +01 EXPECTED 13266 20060612 20060526 20060526 TR + TR TURKCELL TURKCELL +23769.01 #EOD *
I'm parsing the csv files using the following code:
use strict; use warnings; use Data::Dumper; my $file_ss = "/tmp/ONE_ACE.csv"; my $file_sc = "/tmp/TWO_ACE.csv"; my $data_ss = get_csv_data($file_ss); my $data_sc = get_csv_data($file_sc); print Dumper($data_ss); #data of the first csv file my $out_file="/tmp/aggregate.csv"; my $output_header="FileName,Status,Event ID,RR Security Code"; my $output_body = "DATA returned from the the matching eventids/rr_sec +urity_code";TODO sub get_csv_data { my $open_file = shift; my ($curr_country_currency, $new_country_currency, ); #Actual data is after line "AAB>>>>" my %data=(); #local $/=">>>\n"; if (!-f"$open_file") { report("Abort","Could not find $open_file file"); exit 1; } open(DATA,"$open_file"); while(<DATA>) { #Actual data after the line SSL>>>>>SSV>>>>> if(/AAB>>>>/ .. /#EOD/){ chomp; s///g; s/^\s+//g; next if /^$/; next if /AAA>>/; next if /AAB>>/; my($status, $event_id, $effdate, $last_updated, $entry_da +te, $curr_country, $new_country, $curr_sec_name, $new_sec_name, $msci_sec_code) = split(/\|/); $data{$open_file} = "$open_file|$status|$event_id|$ms +ci_sec_code"; } } #while close DATA; return (%data); }
Now the problem part. We will allow repeated "Event ID" that are from the same file, but do not include the data of a row if it is a repeated "Event Id" and "RR Secutiry Code" from a different csv file, instead I need to concatenate its FileName to the oldFileName, i.e for example:
File A ('a') and File B ('b') - File A's data has already been scanned
Pseudo code: If a.Event ID = b.Event ID { If a.FileName = b.FileName { If a.RR Security code = a.RR Security code FileName = FileName + '/' + b.FileName (e.g. if (a)FileName = 'ONE_ACE' and b.FileName = 'TWO_ACE', the +n FileName = 'ONE_ACE/TWO_ACE') Return nothing; Else Return data of row to be included in output file; } Else Return data of row to be included in output file; } Else Return data of row to be included in output file;
I need some suggestions/pseudo code to implement the same.
Thanks in advance.

Replies are listed 'Best First'.
Re: Matching and concatenating similar Eventids
by MonkE (Hermit) on Jun 02, 2006 at 17:05 UTC
    If I understand your problem correctly, I recommend that you use a binding operator like so in case of a duplicate:
    if (you-have-a-duplicate) { $data{$other_file} =~ s/^([^\|]*)\|/\1\/$other_file\|/; $data{$open_file} = "$open_file/$other_file|$status|$event_id|$msci +_sec_code"; }
    This will add the filename to both the open file and the "other" file's entry.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://553303]
Approved by wazoox
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others admiring the Monastery: (3)
As of 2024-04-20 05:47 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found