If your 12 terabyte estimate comes from
Re: Catching Cheaters and Saving Memory, then it is an overestimate. The view is likely to be somewhere north of 400 GB. 12 terabytes is an estimate of how much data you need to throw around to sort and group that dataset. But you won't need all 12 terabytes at once, you only need a terabyte or so of it.
However a database should do that sorting step somewhat faster than the kind of naive program that I'd expect to see someone write for a one-off. For one thing the first several sort and group steps can be done in chunks all in memory. It doesn't need to hit disk until the amount of data you need to throw around exceeds your memory limit. Databases are smart about switching to disk only when they need to. That may double the speed. (I would be unlikely to do that for a one-off because it is a lot of logic to write.)
That said, databases aren't magic, and this problem is one which would show how non-magical they are. First of all you're likely to run out of disk if you're using commodity hardware. (Oops.) And as you pointed out elsewhere, pre-filtering your dataset so you can focus on only the people you're going to care about is a far bigger win than anything a database knows how to do.
Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
Read Where should I post X? if you're not absolutely sure you're posting in the right place.
Please read these before you post! —
Posts may use any of the Perl Monks Approved HTML tags:
- a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
| |
For: |
|
Use: |
| & | | & |
| < | | < |
| > | | > |
| [ | | [ |
| ] | | ] |
Link using PerlMonks shortcuts! What shortcuts can I use for linking?
See Writeup Formatting Tips and other pages linked from there for more info.