The problem is the nature of the patterns of usage in the merge phase. Sometimes many records will be read from one temporary file, then many from another, and then another and then (say) back to the first. At other points, the access pattern will skip from file to file for every record.
This ugly pattern can be easy eliminated at the application layer reading the records in blocks instead of one by one.
In reply to Re^4: Working on huge (GB sized) files
by salva
in thread Working on huge (GB sized) files
by vasavi
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |