in reply to Merge two huge datasets by ID

"Merge" ... "datasets" ... "by (something)". Gee, that sounds like SQL.

select id, name from fileone where id in (select id from filetwo)
Unfortunately, I'm not having a lot of luck getting SQL::Statement (which underpins DBD::CSV) into accepting this. Could be something wrong with my syntax, but this doesn't seem that convoluted yet. SQL ERROR: Bad table or column name 'select id from f2' has chars not alphanumeric or underscore! (?) The full code I'm trying is:
#! /usr/bin/perl use strict; use warnings; use DBI; my $dbh = DBI->connect("DBI:CSV:f_dir=.;csv_eol=\n;csv_sep_char=\t"); unlink 'out'; $dbh->do("create table out (id integer, name char(64))"); $dbh->{csv_tables}{f1}{col_names} = [qw(id name)]; $dbh->{csv_tables}{f2}{col_names} = [qw(id)]; my $d = $dbh->selectall_arrayref('select id, name from f1 where id in +( select id from f2 )', {});
and something like this should work (assuming tab separators), but, like I said, I'm having some minor problems with it. Assuming we get the kinks worked out (could be my SQL, could be SQL::*), I would think this to be the simplest situation. And, if it's a problem with SQL::*, then moving your "huge" data to a real database may be wise anyway, and that'd solve the problem right out as their SQL parser/engine would probably handle it ;-)