Re^3: efficient perl code to count, rank

in reply to Re^2: efficient perl code to count, rank
in thread efficient perl code to count, rank

A database comes with significant overhead: You need to INSERT 14 millions of records before you can even start

That's not necessarily true: one can (for instance, in postgres) use the SQL-machinery against a so-called foreign table, that uses a text file as underlying data. That means no INSERT is necessary.

Making a foreign table takes no time (after all, it's really just a configuration) but of course any reading or sorting with many GBs will take approximately as long as in any another programming language. The advantage would be access via SQL. (BTW, I'm not saying such access via database is useful for the OP, he may have the overhead of learning this particular trick).

(And yes, I did try it out: retrieving a few values from a foreign table that sits on top of a csv-file of 27GB (1250 columns, 2M rows), via

SELECT
    column10
  , column20
FROM junk.broadcsv
ORDER BY column100 DESC 
LIMIT 10
[download]

took ~10 minutes on my old 8GB desktop)

In Section Seekers of Perl Wisdom