in reply to perl ST sort performance issue for large file?
As already identified, the ST works fine for smallish datasets, but consumes prodigious amounts of memory for the millions of small anonymous arrays, when used on large datasets.
In addition to that, you are compounding things by creating multiple, huge, stack-based lists in the following line:
for $key( map{ $_ -> [0] } sort{ $a->[1] cmp $b->[1] || $a->[2] cm +p $b->[2] } map{ [ $_,(split)[0],(split)[2] ] } keys %hash ) ## ^list1 ^list2 + ^list3 ^list4 ## + ^^^^^^ dups ^^^^^
And duplicating effort by performing the same split twice for every record.
If you moved your code to using a GRT (see A brief tutorial on Perl's native sorting facilities.) it would probably use about 1/4 the memory, and run in about 1/10th the time.
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^2: perl ST sort performance issue for large file?
by AnomalousMonk (Archbishop) on Mar 30, 2012 at 12:06 UTC | |
by BrowserUk (Patriarch) on Mar 30, 2012 at 12:28 UTC | |
by AnomalousMonk (Archbishop) on Mar 30, 2012 at 13:10 UTC | |
by BrowserUk (Patriarch) on Mar 30, 2012 at 13:57 UTC | |
by AnomalousMonk (Archbishop) on Mar 30, 2012 at 14:15 UTC |