in reply to Merge two huge datasets by ID
"Merge" ... "datasets" ... "by (something)". Gee, that sounds like SQL.
Unfortunately, I'm not having a lot of luck getting SQL::Statement (which underpins DBD::CSV) into accepting this. Could be something wrong with my syntax, but this doesn't seem that convoluted yet. SQL ERROR: Bad table or column name 'select id from f2' has chars not alphanumeric or underscore! (?) The full code I'm trying is:select id, name from fileone where id in (select id from filetwo)
and something like this should work (assuming tab separators), but, like I said, I'm having some minor problems with it. Assuming we get the kinks worked out (could be my SQL, could be SQL::*), I would think this to be the simplest situation. And, if it's a problem with SQL::*, then moving your "huge" data to a real database may be wise anyway, and that'd solve the problem right out as their SQL parser/engine would probably handle it ;-)#! /usr/bin/perl use strict; use warnings; use DBI; my $dbh = DBI->connect("DBI:CSV:f_dir=.;csv_eol=\n;csv_sep_char=\t"); unlink 'out'; $dbh->do("create table out (id integer, name char(64))"); $dbh->{csv_tables}{f1}{col_names} = [qw(id name)]; $dbh->{csv_tables}{f2}{col_names} = [qw(id)]; my $d = $dbh->selectall_arrayref('select id, name from f1 where id in +( select id from f2 )', {});
|
|---|