in reply to sorting very large text files
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^2: sorting very large text files
by BrowserUk (Patriarch) on Dec 21, 2009 at 15:02 UTC | |
No! It'll take longer to just import the data into the RDBMS than your local system sort utility will take to complete. And after that, you've still got to index the data, perform the sorting query and output the data sorted. Allow for anything from 5 to 10 times as long to achieve the final goal of sorted data in a single file on disk. Maybe more. Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
| [reply] |
by erix (Prior) on Dec 21, 2009 at 20:39 UTC | |
BrowserUk >>[...] from 5 to 10 times as long ... Maybe more. Sort will obviously always be faster than using a database (here, PostgreSQL) for loading and querying for the same result, but I want to show that you exaggerate the overhead of loading+quering into an output file. Indexing would be redundant, but I bet even with an index it wouldn't take that much time. I constructed a similar file, and sorted by the dna column in the two ways under consideration, (sort versus database slurp+query).
So:
unix sort: real 59m48.641s
table ORDER BY: real 59m12.569s
the latter preceded by 6m20.430s overhead for loading table data
So much for your (BrowserUK's) guess: "database takes 5 to 10 times as long as system sort ... Maybe more"... It almost amounts to slander :) Update 1 Of course, I couldn't resist to trying with an index as well. and sure enough it's useless / not used:
63m to create the index - that makes sense. But of course, Pg will not use it:
update 2: removed a useless use of cat, lest I receive another uuc-award... | [reply] [d/l] [select] |
by BrowserUk (Patriarch) on Dec 21, 2009 at 20:56 UTC | |
It almost amounts to slander :) Nice one! A DB answer with real code and data. A rare animal. Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
| [reply] |