http://qs1969.pair.com?node_id=32621


in reply to RE: Merging Files: A Different Twist
in thread Merging Files: A Different Twist

Unfortunately, your initial example is flawed. I see no way to tell how to relate the two files. I'll assume that the "name" fields are really supposed to be identical, and you want the names and address of all the generals.

What it sounds like is you want to implement something like the SQL "select" statement:

SELECT file1.name,address FROM file1, file2 WHERE file1.name = file2.name AND rank = 'General';
If this is what you want, then the "use a real DB, like DBD::RAM" suggestion is probably the most general.

If you really wanted to do this by hand...

First, an insult to your intelligence... I had to hand-code this type of thing as a homework assigment for a database class I took to get my degree. Given how poorly specced the problem is, it sounds like the same type of thing, except you won't be writing it in perl4 like I did. As such, I won't give you the -best- solution, but I'll give you a working solution. Some parts are downright stupid for how to do it. This is intentional -- If you can fix it, you earned it.

A relational DB has tables consisting of rows and columns. A join operation basically creates the cartesian product of two tables, and a selection operation creates new table which has a subset of rows from the old, and a projection operation creates a new table which has a subset of columns.

What it sounds like is you want to join your two files, take a selection, then a projection. Let's solve it that way.

We first need to read in the two files into tables. You could do something like this to read in each file:

while (<FILE1>) { chomp push @table1,[ split /\s+/ ]; }
This effectively creates a 2-dimensional array, where each row is a record in the file, and each column is, well, a column.

I'm assuming you know which column is which; you aren't relying on things like the first line of the file having the column names. The tables then contain just pure data.

So now we have @table1 and @table2. Time to join them:

my @join; for my $i (@table1) { for my $j (@table2) { push @join,@$i,@$j; } }
Good, now we have joined the two tables. We now need to select from the tables the info we need.

Let's assume we are looking for things which meet the following criteria:

given that there are 3 columns in each table...
my @select; for $i (@join) { push @select,$i if ($i->[0] eq $i->[3]) and ($i->[1] eq "Joe"); }
Now, that wasn't hard. Final step now, the projection... We want just columns 1 and 3 from table 1, and column 2 from table 2:
my @project; for $i (@select) { push @project,[ $i->[0,2,4] ]; }
Now all you have to do is print out @project into file3, and you are done. That I'll leave as an exercise for you.

If I've done this right, you should be able to simplify this a lot, or at least, use it as a basis for what you are doing.

And for the curious: No this is not how I did it for my DB class... for that I cheated, and used persistant hashes, not arrays.