Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl-Sensitive Sunglasses
 
PerlMonks  

Re^3: efficient perl code to count, rank

by erix (Prior)
on Jul 17, 2021 at 23:06 UTC ( [id://11135122]=note: print w/replies, xml ) Need Help??


in reply to Re^2: efficient perl code to count, rank
in thread efficient perl code to count, rank

A database comes with significant overhead: You need to INSERT 14 millions of records before you can even start

That's not necessarily true: one can (for instance, in postgres) use the SQL-machinery against a so-called foreign table, that uses a text file as underlying data. That means no INSERT is necessary.

Making a foreign table takes no time (after all, it's really just a configuration) but of course any reading or sorting with many GBs will take approximately as long as in any another programming language. The advantage would be access via SQL. (BTW, I'm not saying such access via database is useful for the OP, he may have the overhead of learning this particular trick).

(And yes, I did try it out: retrieving a few values from a foreign table that sits on top of a csv-file of 27GB (1250 columns, 2M rows), via

SELECT column10 , column20 FROM junk.broadcsv ORDER BY column100 DESC LIMIT 10

took ~10 minutes on my old 8GB desktop)

Replies are listed 'Best First'.
Re^4: efficient perl code to count, rank
by haj (Vicar) on Jul 17, 2021 at 23:34 UTC
    one can (for instance, in postgres) use the SQL-machinery against a so-called foreign table,

    Fair enough! That would be especially convenient if the database offers some aggregation functions to perform the logic of the Perl code, so that you wouldn't even need to SELECT all the stuff back. That's beyond my postgres-fu, though.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://11135122]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others wandering the Monastery: (7)
As of 2024-03-28 22:06 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found