Sorry, I don't understand the above comment at all. Would you mind explaining that?
Let's say you have 100 files to process:
- 2,250,000 lines
- 500,000 lines
- And the rest are all 10,000 lines long
Let's say you have enough memory to handle 1,000,000 lines. Let's ignore byte size for simplicity, but you'll I placed a limit on that as well in the split example.
You have two problems:
- You have many files to merge.
- One of your files is too big.
The first problem is easy to fix: Just concatenate all the files. Then you're left with one file that's too big.
- 3,730,000 lines
The second problem is easy to fix as well: Just split the file into smaller pieces. Just keep those pieces as big as possible.
You end up with:
- 1,000,000 lines
- 1,000,000 lines
- 1,000,000 lines
- 730,000 lines
If all you had done was split the large file, you would have 101 files to sort and merge, resulting in 100 merges. By concatenating first, all you had to do is 4 (long) merges. It cuts down on overhead a little.
Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
Read Where should I post X? if you're not absolutely sure you're posting in the right place.
Please read these before you post! —
Posts may use any of the Perl Monks Approved HTML tags:
- a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
| |
For: |
|
Use: |
| & | | & |
| < | | < |
| > | | > |
| [ | | [ |
| ] | | ] |
Link using PerlMonks shortcuts! What shortcuts can I use for linking?
See Writeup Formatting Tips and other pages linked from there for more info.