Beefy Boxes and Bandwidth Generously Provided by pair Networks
Your skill will accomplish
what the force of many cannot
 
PerlMonks  

Large file problem

by JohnBrook (Acolyte)
on Dec 01, 2004 at 17:18 UTC ( [id://411505]=perlquestion: print w/replies, xml ) Need Help??

JohnBrook has asked for the wisdom of the Perl Monks concerning the following question:

Good afternoon (in my time zone), fellow seekers. I am a complete novitiate here, this is my first post, so although I have read the guidelines, forgive me if in my ignorance I transgress in any way. I do have a few years' self-taught experience in Perl.

I am having a problem working with a large file under Perl 5.6.1, build MSWin32-x86-multi-thread, under Windows XP. In addition to Googling for an answer, I have also read How can I process large files efficiently? in the Questions and Answers, but it was not sufficient to solve my problem. I am already processing the file line by line (I think!).

There was a second suggestion there to use Tie::File. I tried this and don't seem to have it installed. Of course I can install it, but I'm wondering if there is a more obvious problem with what I'm doing. I'm not sure why I would need to use Tie::File if there is a way to just process the file line by line other than what I'm already doing.

Here is my code (stripped down to essentials, as the guidelines suggest, but which I had already done anyway):

use strict; use warnings; open IN, "test.txt" or die "Could not open 'test.txt'\n"; for(<IN>) { # do nothing } close IN;
The output is simply "Out of memory!" after the hard drive runs for about 2 minutes. The file is about 42 MB. What in this program could be gobbling up memory? Is this not the standard way to process a file line by line?

Lastly, it just occurred to me to see if maybe the newlines in the file were not standard DOS newlines (CR/LF), but they are. So that ain't it.

Replies are listed 'Best First'.
Re: Large file problem
by sweetblood (Prior) on Dec 01, 2004 at 17:26 UTC
    Just change your 'for' to 'while'
    while (<IN>) {
    HTH

    Sweetblood

      Just change your 'for' to 'while'

      ...because for would slurp your file into memory,
      whereas while only reads it line by line.

      It's mentioned in perldoc perlintro
      Update: bummer, it isn't mentioned there. And so certain was I...

      Cheers, Sören

      DOH! Sorry. Works fine. Forget I asked! :-)

      (One of those things it takes someone else to see...)

Re: Large file problem
by radiantmatrix (Parson) on Dec 01, 2004 at 17:35 UTC

    Your problem lies in your loop. The Angle operator (i.e. <IN>) works in two contexts: list and scalar. In list context, it returns all the lines of the file as a list; In scalar context, it reads and returns one line at a time.

    So:

    for (<IN>) { # Do some stuff; }

    Reads all the files from IN into an array, then processes them one at a time. You want:

    while (<IN>) { # Do some stuff; }

    In the above, the angle operator is called in scalar context, so only one file line will be read during each loop iteration. That should (I hope) solve your memory and timing problems.

    Update: I see that I type too slowly. ;-)

    radiantmatrix
    require General::Disclaimer;
    s//2fde04abe76c036c9074586c1/; while(m/(.)/g){print substr(' ,JPacehklnorstu',hex($1),1)}

      Thanks for the further clarification. Although I know "while" is what I meant (when's Perl going to have DWIM?), it's good to also understand how Perl was responding to exactly what I was telling it to do. All is made clear now. Thank you!
        when's Perl going to have DWIM?

        Lists will be evaluated lazily in Perl 6, which means Perl 6 will DWYM in this circumstance (I think!)

Re: Large file problem
by melora (Scribe) on Dec 01, 2004 at 20:11 UTC
    I just installed and tried Tie::File. I've been doing "while (<IN>)" myself for lo these many years, and I'm always reluctant to try something new.
    I now repent me of my idiotic foolishness. Tie-ing a file to an array is Very Cool and quite useful.

      Tie-ing a file to an array is Very Cool and quite useful.

      Forgive me to note that it is also very slow,
      it pays to keep that in mind.

      Having said that, I confess: oh yes, Tie::File is so cool
      I must have used it knowing that I shouldn't have. =/

      Cheers, Sören

Re: Large file problem
by Anonymous Monk on Dec 03, 2004 at 02:57 UTC
    dude.. try:
    open (IN,"<test.txt") || die; while (<IN>) { chomp; # the current line will be in $_ without a trailing newline }

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://411505]
Approved by JSchmitz
Front-paged by jfroebe
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others surveying the Monastery: (5)
As of 2024-04-23 17:35 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found