Re: Joining to flat-files on primary-foregin key

Maybe with a hash or array indexes? I'm not sure if you data is CSV compliant with quoted text blocks or if it is safe to split on just ','. I'll assume that it is, but you may have to do some extra work to be sure. Not clear if you mean JOIN in the DB sense - if you want to do DB stuff with flatfiles look at DBD::CSV. Otherwise this will work only printing records which appear in both datasets. There are comments for lines which are not necessary if you don't care about the data being in both datasets.

my %data;
my @fieldlist1 = split(/,/,<DATAONE>);
while(<DATAONE>) {
 my ($key,@rest) = split(/,/,$_);
 push @{$data{$key}}, @rest;
}
my @fieldlist2 = split(/,/,<DATATWO>);
shift @fieldlist2; # throw away the first field b/c it is the id

while( <DATATWO>) {
 my ($key,@rest) = split(/,/,$_);

 #if you want to JOIN these two files, 
 # and skip records where there is not <DATAONE>
 next unless $data{$key};

 push @{$data{$key}},@rest;
}

my @fields = (@fieldlist1,@fieldlist2);
# assuming the PK is numeric
print join(',',@fields),"\n";
foreach my $id ( sort { $a <=> $b } keys %data ) {
  # if you wanted to skip the lines where there 
  # was data in ONE but not in TWO
  # you need a count of the number of fields you expect
  # which is handily avaialable in @fields -1 (ignoring id)
   next unless scalar @{$data{$id}} == (scalar @fields -1);
 print join(',', $id, @{$data{$id}}),"\n";
}
[download]

Update: Of course process the first line of each of the files to get the field list - updated to do that.

Comment on Re: Joining to flat-files on primary-foregin key Download Code