Maybe with a hash or array indexes? I'm not sure if you data is CSV compliant with quoted text blocks or if it is safe to split on just ','. I'll assume that it is, but you may have to do some extra work to be sure.
Not clear if you mean JOIN in the DB sense - if you want to do DB stuff with flatfiles look at
DBD::CSV. Otherwise this will work only printing records which appear in both datasets. There are comments for lines which are not necessary if you don't care about the data being in both datasets.
my %data;
my @fieldlist1 = split(/,/,<DATAONE>);
while(<DATAONE>) {
my ($key,@rest) = split(/,/,$_);
push @{$data{$key}}, @rest;
}
my @fieldlist2 = split(/,/,<DATATWO>);
shift @fieldlist2; # throw away the first field b/c it is the id
while( <DATATWO>) {
my ($key,@rest) = split(/,/,$_);
#if you want to JOIN these two files,
# and skip records where there is not <DATAONE>
next unless $data{$key};
push @{$data{$key}},@rest;
}
my @fields = (@fieldlist1,@fieldlist2);
# assuming the PK is numeric
print join(',',@fields),"\n";
foreach my $id ( sort { $a <=> $b } keys %data ) {
# if you wanted to skip the lines where there
# was data in ONE but not in TWO
# you need a count of the number of fields you expect
# which is handily avaialable in @fields -1 (ignoring id)
next unless scalar @{$data{$id}} == (scalar @fields -1);
print join(',', $id, @{$data{$id}}),"\n";
}
Update: Of course process the first line of each of the files to get the field list - updated to do that.
Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
Read Where should I post X? if you're not absolutely sure you're posting in the right place.
Please read these before you post! —
Posts may use any of the Perl Monks Approved HTML tags:
- a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
| |
For: |
|
Use: |
| & | | & |
| < | | < |
| > | | > |
| [ | | [ |
| ] | | ] |
Link using PerlMonks shortcuts! What shortcuts can I use for linking?
See Writeup Formatting Tips and other pages linked from there for more info.