Re: Speed and memory issue with large files

Tie::File does not work well on huge files. The following finds and prints the 20 millionth line of a 40 million line 3GB file in 12 seconds:

c:\test>wc -l syssort
40000000 syssort

c:\test>dir syssort
19/12/2009  13:47     3,160,000,000 syssort

perl -le"$t=time;scalar<>for 1..20e6;print scalar<>;print time()-$t" s
+yssort
49_992_005_J1 chr9      97768833        97768867        ATTTTCTTCAATTA
+CATTTCCAATGCTATCCCAAA     +   35

12
[download]

Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.

"Science is about questioning the status quo. Questioning authority".

In the absence of evidence, opinion is indistinguishable from prejudice.

"I'd rather go naked than blow up my ass"

Comment on Re: Speed and memory issue with large files Download Code

Replies are listed 'Best First'.
Re^2: Speed and memory issue with large files by ikegami (Patriarch) on Mar 19, 2010 at 17:21 UTC
Tie::File does not work well on huge files. Indeed. It memorizes the byte position of the start of every line it has encountered in order to jump to a specific line quickly. This adds up, and that functionality isn't needed here (since there's no need to jump back). Contrary to what the documentation implies, this memory usage cannot be limited.	[reply]
Re^3: Speed and memory issue with large files by BrowserUk (Patriarch) on Mar 19, 2010 at 19:53 UTC
You're at it again. Not only have you changed the content of this node without attribution, you've also changed the entire tone and meaning of it. You really are underhand. Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice. "I'd rather go naked than blow up my ass"	[reply]
A reply falls below the community's threshold of quality. You may see it by logging in.
Re^2: Speed and memory issue with large files by firmament (Novice) on Mar 19, 2010 at 16:53 UTC
Thanks a bunch!	[reply]