Delegating it to sort - be it the shell command or the Perl built-in - is a waste of resources, which doesn't scale well.
Counting is O(n) but sort at best O(n log(n)) plus you suggest doubling the needed disc space, involving costly disc access time.
The time needed for a) merging and b) reading again and c) writing the sorted file can easily take more time than just reading once and incrementing a hash.
Additionally you are not saving much code, because the OP wants to chose the files dynamically.
If this was a one-shot problem your approach could be OK'ish, but I doubt that's the case in Bio Inf.
If it's possible to use sort, why not just using wc (word count) ?
Cheers Rolf
(addicted to the Perl Programming Language and ☆☆☆☆ :)
Wikisyntax for the Monastery
In reply to Re^2: Out of Memory when generating large matrix
by LanX
in thread Out of Memory when generating large matrix
by cathyyihao
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |