reading from a huge file

esddew has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: reading from a huge file by BrowserUk (Patriarch) on Mar 21, 2011 at 18:59 UTC
does anyone have experience creating a filehandle and reading from very large files on a Windows NT server? Yes. I've been manipulating huge (>4GB) files with Perl since 5.6.1 and never had a problem opening or reading them. So, whatever your problem is, it isn't fundamentally a problem with either Perl or Windows, which only leaves your code. If you posted (the relevant parts of) your code, we might be able to spot something. Better still, you say "the open dies"; how about posting the error message produced when it dies? Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice.	[reply]
Re^2: reading from a huge file by educated_foo (Vicar) on Mar 21, 2011 at 19:11 UTC
Actually, unless your Perl is built with large file support (USE_LARGE_FILES in perl -V), you may not be able to deal with files larger than 4 GB. I think the OP's basically stuffed unless he can get another program to chop the file into smaller pieces, or install a version of Perl with large file support.	[reply]
Re^3: reading from a huge file by BrowserUk (Patriarch) on Mar 21, 2011 at 19:20 UTC
I think the OP's basically stuffed ... This is FUD. Didn't you notice I said *(>4GB)*? Every version of Perl I've used on windows in the last 9+years has been built with USE_LARGE_FILES. Here is perl processing a 12GB/141 million line file using Active State 5.10.1 in 88 seconds: `C:\test>dir dna.txt Volume in drive C has no label. Volume Serial Number is 8C78-4B42 Directory of C:\test 08/11/2010 16:35 12,831,000,000 dna.txt 1 File(s) 12,831,000,000 bytes 0 Dir(s) 281,938,862,080 bytes free C:\test>perl -nE"}{say $." dna.txt 141000000` [download] Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice.	[reply] [d/l]
Re^4: reading from a huge file by educated_foo (Vicar) on Mar 21, 2011 at 19:51 UTC
Re^5: reading from a huge file by BrowserUk (Patriarch) on Mar 21, 2011 at 20:25 UTC
Some notes below your chosen depth have not been shown here
Re^3: reading from a huge file by esddew (Initiate) on Mar 21, 2011 at 19:38 UTC
Thanks, that's helpful. Perl wasn't even on this server when this all started, so I've no idea what the systems guys even put on it. As I said in the original message, I don't actually have access to the box myself, which is frustrating matters. For what it's worth, the error that the person running the script reported to me in email was: `Unable to open input file UDB_sessions01252011` That leads me to believe that the code is failing at this point: `## Open input file open(INFILE, "$infile") \|\| die "unable to open input file $infile\n";` [download] The code as written is pretty darn elementary. I am assuming that my user got the name of the file she was trying to use correct, though I could double check that.	[reply] [d/l] [select]
Re^4: reading from a huge file by Anonymous Monk on Mar 21, 2011 at 19:51 UTC
Re^5: reading from a huge file by SuicideJunkie (Vicar) on Mar 21, 2011 at 19:57 UTC
Re: reading from a huge file by ikegami (Patriarch) on Mar 21, 2011 at 19:06 UTC
The output of `perl -V` (uppercase "V") might also be useful in determining the problem.	[reply] [d/l]
Re: reading from a huge file by chilledham (Friar) on Mar 21, 2011 at 20:32 UTC
You can always use Tie::File and `tie` to the file. (See especially mode for finer control.) Something along the lines of (taken pretty much straight from the POD): `use Tie::File; use Fcntl 'O_RDWR'; tie my @contents, 'Tie::File', $ARGV[0], mode => O_RDWR; # Assumes you pass the file name on the command line` [download] You can now cycle through `@contents` like a typical array, each field being a line of your file. Be careful, the `mode` passed will determine the sort of changes (if any) you can make to the file, whether intentional or otherwise. Both Tie::File and Fcntl are core modules, so no CPAN necessary. While Fcntl isn't entirely necessary, it will help you maintain finer control over how the file is accessed on disk. Again, see the 'mode' section of the documentation. Notes: `strict` and `warnings` implied. I haven't tried this with a file of the size you are dealing with, but have had success with other large files (>4G) in the past.	[reply] [d/l] [select]
Re^2: reading from a huge file by BrowserUk (Patriarch) on Mar 21, 2011 at 20:40 UTC
Please do not suggest the use of Tie::File for use with files bigger than a few tens of megabytes. Using it makes processing such files 10s or 100s of times slower than using normal line-by-line access for no benefit. Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice.	[reply]
Re^3: reading from a huge file by chilledham (Friar) on Mar 21, 2011 at 20:41 UTC
Very good to know, BrowserUK. Thanks for the clarification!	[reply]