This sounds like a pretty good idea! A simple sort-merge algorithm will take each "hunk of data" that it can handle, sort that and write it to a new place on disk. This requires one disk read of the entire data set and one write of the data set. 8 GB in the scheme of things is not that "big".
Let's say that each "hunk" is just 500 MB, which my Windows machine can sort easily, we wind up with 16 "hunks". The merge will open say 16 files at once and the next part is easy, just move the top record of each of the 16 to the output. So for "small data sets" like 8GB: 1)read once, 2)write once, 3)read again, 4) write again.
I think that the system utils are faster than this. Very smart ones will shovel stuff between various disks to speed access up and algorithms are smarter than described above. Anyway "sort" on a big machine is heavily optimized. The Unix command line will probably do much better than you think.
Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
Read Where should I post X? if you're not absolutely sure you're posting in the right place.
Please read these before you post! —
Posts may use any of the Perl Monks Approved HTML tags:
- a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
| |
For: |
|
Use: |
| & | | & |
| < | | < |
| > | | > |
| [ | | [ |
| ] | | ] |
Link using PerlMonks shortcuts! What shortcuts can I use for linking?
See Writeup Formatting Tips and other pages linked from there for more info.