Joining to flat-files on primary-foregin key

data67 has asked for the wisdom of the Perl Monks concerning the following question:

Hey guys, i have two flat-files that i am working on. One file has a primary key identifier and some attributes.
id#, Name, Address, Phone.
1212, Bob, 113 Main St, 555-1212,
1231, Aron, 114 Main St, 555-1221,

File #2 uses this ID (as foregin key) to associate some relevant info to the id.
id#, class-grade, instructor, blah..blah...
1212, B, Dr. Gillo, ...
1231, A, Dr. Ross, ...

example:

ID | NAME | ADDRESS | PHONE | CLASS-GRADE | INSTRUCTOR | etc.
1212, Bob, 113 Main St, 555-1212, B, Dr. Gillo, ...
1231, Aron, 114 Main St, 555-1221, A, Dr. Ross, ...
etc.
[download]

Now how can i JOIN these to files together on the id.

Comment on Joining to flat-files on primary-foregin key Download Code

Replies are listed 'Best First'.
Re: Joining to flat-files on primary-foregin key by stajich (Chaplain) on Jun 18, 2002 at 19:46 UTC
Maybe with a hash or array indexes? I'm not sure if you data is CSV compliant with quoted text blocks or if it is safe to split on just ','. I'll assume that it is, but you may have to do some extra work to be sure. Not clear if you mean JOIN in the DB sense - if you want to do DB stuff with flatfiles look at DBD::CSV. Otherwise this will work only printing records which appear in both datasets. There are comments for lines which are not necessary if you don't care about the data being in both datasets. my %data; my @fieldlist1 = split(/,/,<DATAONE>); while(<DATAONE>) { my ($key,@rest) = split(/,/,$_); push @{$data{$key}}, @rest; } my @fieldlist2 = split(/,/,<DATATWO>); shift @fieldlist2; # throw away the first field b/c it is the id while( <DATATWO>) { my ($key,@rest) = split(/,/,$_); #if you want to JOIN these two files, # and skip records where there is not <DATAONE> next unless $data{$key}; push @{$data{$key}},@rest; } my @fields = (@fieldlist1,@fieldlist2); # assuming the PK is numeric print join(',',@fields),"\n"; foreach my $id ( sort { $a <=> $b } keys %data ) { # if you wanted to skip the lines where there # was data in ONE but not in TWO # you need a count of the number of fields you expect # which is handily avaialable in @fields -1 (ignoring id) next unless scalar @{$data{$id}} == (scalar @fields -1); print join(',', $id, @{$data{$id}}),"\n"; } [download] Update: Of course process the first line of each of the files to get the field list - updated to do that.	[reply] [d/l]

Replies are listed 'Best First'.

Re: Joining to flat-files on primary-foregin key
by stajich (Chaplain) on Jun 18, 2002 at 19:46 UTC

DBD::CSV

my %data;
my @fieldlist1 = split(/,/,<DATAONE>);
while(<DATAONE>) {
 my ($key,@rest) = split(/,/,$_);
 push @{$data{$key}}, @rest;
}
my @fieldlist2 = split(/,/,<DATATWO>);
shift @fieldlist2; # throw away the first field b/c it is the id

while( <DATATWO>) {
 my ($key,@rest) = split(/,/,$_);

 #if you want to JOIN these two files, 
 # and skip records where there is not <DATAONE>
 next unless $data{$key};

 push @{$data{$key}},@rest;
}

my @fields = (@fieldlist1,@fieldlist2);
# assuming the PK is numeric
print join(',',@fields),"\n";
foreach my $id ( sort { $a <=> $b } keys %data ) {
  # if you wanted to skip the lines where there 
  # was data in ONE but not in TWO
  # you need a count of the number of fields you expect
  # which is handily avaialable in @fields -1 (ignoring id)
   next unless scalar @{$data{$id}} == (scalar @fields -1);
 print join(',', $id, @{$data{$id}}),"\n";
}
[download]

Update:

[reply]
[d/l]