in reply to Re^4: read and sort multiple files
in thread read and sort multiple files
Sorry, I don't understand the above comment at all. Would you mind explaining that?
Let's say you have 100 files to process:
Let's say you have enough memory to handle 1,000,000 lines. Let's ignore byte size for simplicity, but you'll I placed a limit on that as well in the split example.
You have two problems:
The first problem is easy to fix: Just concatenate all the files. Then you're left with one file that's too big.
The second problem is easy to fix as well: Just split the file into smaller pieces. Just keep those pieces as big as possible.
You end up with:
If all you had done was split the large file, you would have 101 files to sort and merge, resulting in 100 merges. By concatenating first, all you had to do is 4 (long) merges. It cuts down on overhead a little.
|
|---|