if the goal is to have everything in one big file ordered by key, and you have several smaller 500MB files already ordered by key, then you just want to do a straight merge:
- open all of the files
- read a key from each file
- sort the filehandles by key
- from the file corresponding to the smallest key,
- read that value,
- copy the key and value to the output
- read the next key from the same file,
- move that filehandle to the place on filehandle list corresponding to the new key (or just sort the list again, if it's really short)
- go back to 4 and repeat until all of the files are exhausted.
This should use extremely small amounts of memory — you're only ever keeping
n filehandles and
n keys in memory at any given time and every file is being read sequentially, which is the fastest way you can do things, diskwise.
On the other hand, I'm still not clear on why you'd want everything in one file; much depends on how you're going to be using this file thereafter.
You may do just as well to, instead of copying the value out in step 6, just call tell() to get a disk position and record that instead. That way you can have a master file that associates every key with a disk position and a value from 1..n indicating which file it is, and then you're not having to copy any files at all.
Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
Read Where should I post X? if you're not absolutely sure you're posting in the right place.
Please read these before you post! —
Posts may use any of the Perl Monks Approved HTML tags:
- a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
| |
For: |
|
Use: |
| & | | & |
| < | | < |
| > | | > |
| [ | | [ |
| ] | | ] |
Link using PerlMonks shortcuts! What shortcuts can I use for linking?
See Writeup Formatting Tips and other pages linked from there for more info.