http://qs1969.pair.com?node_id=11135122


in reply to Re^2: efficient perl code to count, rank
in thread efficient perl code to count, rank

A database comes with significant overhead: You need to INSERT 14 millions of records before you can even start

That's not necessarily true: one can (for instance, in postgres) use the SQL-machinery against a so-called foreign table, that uses a text file as underlying data. That means no INSERT is necessary.

Making a foreign table takes no time (after all, it's really just a configuration) but of course any reading or sorting with many GBs will take approximately as long as in any another programming language. The advantage would be access via SQL. (BTW, I'm not saying such access via database is useful for the OP, he may have the overhead of learning this particular trick).

(And yes, I did try it out: retrieving a few values from a foreign table that sits on top of a csv-file of 27GB (1250 columns, 2M rows), via

SELECT column10 , column20 FROM junk.broadcsv ORDER BY column100 DESC LIMIT 10

took ~10 minutes on my old 8GB desktop)