Re: Merge two huge datasets by ID

"Merge" ... "datasets" ... "by (something)". Gee, that sounds like SQL.

  select id, name from fileone where id in (select id from filetwo)
[download]

Unfortunately, I'm not having a lot of luck getting SQL::Statement (which underpins DBD::CSV) into accepting this. Could be something wrong with my syntax, but this doesn't seem that convoluted yet. SQL ERROR: Bad table or column name 'select id from f2' has chars not alphanumeric or underscore! (?) The full code I'm trying is:

#! /usr/bin/perl
use strict;
use warnings;

use DBI;

my $dbh = DBI->connect("DBI:CSV:f_dir=.;csv_eol=\n;csv_sep_char=\t");
unlink 'out';
$dbh->do("create table out (id integer, name char(64))");
$dbh->{csv_tables}{f1}{col_names} = [qw(id name)];
$dbh->{csv_tables}{f2}{col_names} = [qw(id)];

my $d = $dbh->selectall_arrayref('select id, name from f1 where id in 
+( select id from f2 )', {});
[download]

and something like this should work (assuming tab separators), but, like I said, I'm having some minor problems with it. Assuming we get the kinks worked out (could be my SQL, could be SQL::*), I would think this to be the simplest situation. And, if it's a problem with SQL::*, then moving your "huge" data to a real database may be wise anyway, and that'd solve the problem right out as their SQL parser/engine would probably handle it ;-)

Comment on Re: Merge two huge datasets by ID Select or Download Code