in reply to Re^3: Thrashing on very large lines
in thread Thrashing on very large lines
Since the command-line solution has the side-effect of translating the line-endings, I went with the scripted version from another reply. But I eventually returned to this command-line version to see how it works.
Good news: your $_ pre-allocation trick works, mostly.
Bad news: I had to guess at the right amount.. too low, and it still crawls at some point when it reads the huge record.. too high, and it crawls at the beginning trying to pre-allocate $_. It worked without crawling only for values between 270*1024*1024 to 300*1024*1024. Which is pretty limiting for a general solution. The binmode-script (in this thread) is the best general solution.
Still, if line-ending translation is ok, and I'm NOT running into single records/lines that are hundred of MB in size, this is a reasonably fast and easy way to sort the records. On the 900MB source file, the command-line version took 2m15s compared to the binmode-script at 1m45s.
Thanks very much for your help and insight.
|
|---|