esddew has asked for the wisdom of the Perl Monks concerning the following question:

I have an irritating issue complicated by the fact that I do not have direct access to the box my perl script is going to run on, and I was wondering if anyone had a suggestion for a work around. I got a request a while ago to create a script that would look through a flat file exported from a database and figure out based on the contents of the database whether a file that ought to exist does or not. This I did, and it worked just fine on the test data I had. The problem comes in running the darn thing in production. It turns out that the flat file on the server is 8 Gig, and the open command dies. This is on a windows server that I do not have access to. (Someone else is running the script.) I also don't have a database ID to work with, hence the flat file. I'm going to try sysopen and see if that might work better, but while I'm waiting to hear back from the woman who actually has access to the box, does anyone have experience creating a filehandle and reading from very large files on a Windows NT server? I'm more of a unix persone myself...

Replies are listed 'Best First'.
Re: reading from a huge file
by BrowserUk (Patriarch) on Mar 21, 2011 at 18:59 UTC
    does anyone have experience creating a filehandle and reading from very large files on a Windows NT server?

    Yes. I've been manipulating huge (>4GB) files with Perl since 5.6.1 and never had a problem opening or reading them.

    So, whatever your problem is, it isn't fundamentally a problem with either Perl or Windows, which only leaves your code.

    If you posted (the relevant parts of) your code, we might be able to spot something.

    Better still, you say "the open dies"; how about posting the error message produced when it dies?


    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
      Actually, unless your Perl is built with large file support (USE_LARGE_FILES in perl -V), you may not be able to deal with files larger than 4 GB. I think the OP's basically stuffed unless he can get another program to chop the file into smaller pieces, or install a version of Perl with large file support.
        I think the OP's basically stuffed ...

        This is FUD. Didn't you notice I said (>4GB)?

        Every version of Perl I've used on windows in the last 9+years has been built with USE_LARGE_FILES.

        Here is perl processing a 12GB/141 million line file using Active State 5.10.1 in 88 seconds:

        C:\test>dir dna.txt Volume in drive C has no label. Volume Serial Number is 8C78-4B42 Directory of C:\test 08/11/2010 16:35 12,831,000,000 dna.txt 1 File(s) 12,831,000,000 bytes 0 Dir(s) 281,938,862,080 bytes free C:\test>perl -nE"}{say $." dna.txt 141000000

        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.
        Thanks, that's helpful. Perl wasn't even on this server when this all started, so I've no idea what the systems guys even put on it. As I said in the original message, I don't actually have access to the box myself, which is frustrating matters.

        For what it's worth, the error that the person running the script reported to me in email was: Unable to open input file UDB_sessions01252011

        That leads me to believe that the code is failing at this point:

        ## Open input file open(INFILE, "$infile") || die "unable to open input file $infile\n";
        The code as written is pretty darn elementary. I am assuming that my user got the name of the file she was trying to use correct, though I could double check that.
Re: reading from a huge file
by ikegami (Patriarch) on Mar 21, 2011 at 19:06 UTC

    The output of perl -V (uppercase "V") might also be useful in determining the problem.

Re: reading from a huge file
by chilledham (Friar) on Mar 21, 2011 at 20:32 UTC

    You can always use Tie::File and tie to the file. (See especially mode for finer control.)

    Something along the lines of (taken pretty much straight from the POD):

    use Tie::File; use Fcntl 'O_RDWR'; tie my @contents, 'Tie::File', $ARGV[0], mode => O_RDWR; # Assumes you pass the file name on the command line

    You can now cycle through @contents like a typical array, each field being a line of your file. Be careful, the mode passed will determine the sort of changes (if any) you can make to the file, whether intentional or otherwise.

    Both Tie::File and Fcntl are core modules, so no CPAN necessary.

    While Fcntl isn't entirely necessary, it will help you maintain finer control over how the file is accessed on disk. Again, see the 'mode' section of the documentation.

    Notes: strict and warnings implied. I haven't tried this with a file of the size you are dealing with, but have had success with other large files (>4G) in the past.

      Please do not suggest the use of Tie::File for use with files bigger than a few tens of megabytes.

      Using it makes processing such files 10s or 100s of times slower than using normal line-by-line access for no benefit.


      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.
        Very good to know, BrowserUK. Thanks for the clarification!