in reply to Dear Monks
If you have the option, it may be worth comparing a unix shell solution:
sed -e 's/,/ /' < data | sort | uniq -c
A Perl solution may, in the simplest solution (hash), take up an obnoxious amount of memory. The OS version of sort under unix uses temporary files to solve the "sort this large chunk of data" problem, and may scale better. A lot of it depends on the distribution of your keys.
Update: Given the updated requirements, the shell oneliner is no longer appropriate. If there is a memory constraint issue, look for something like DBM::Deep.
--MidLifeXis
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^2: Sorting and counting 5 Million lines
by runrig (Abbot) on Mar 17, 2011 at 15:03 UTC |