open, file handles and memory

Viking has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re (tilly) 1: open, file handles and memory by tilly (Archbishop) on Jan 13, 2001 at 20:03 UTC
On every operating system unless you are accumulating memory usage in your loop that pattern will work. Just be careful, operations that impose a list-context on the filehandle will slurp the whole thing into an array. So if memory is an issue, avoid the following kinds of things: `foreach my $line (<FILE>) { # etc } my @lines = sort <FILE>; print <FILE>;` [download] Also it is a picky detail, but I find it very helpful to instead of just dying have the die message contain full context information like it recommends in perlstyle. `open(FILE, "< $file") or die "Cannot read $file: $!";` [download] (Or use Carp and confess() rather than die.) A probably useless tip. Occasionally you run across a situation where you want to process large files (eg 40 GB each) and Perl does not have large file support compiled in. In that case do your reads like this: `open(FILE, "cat $file \|") or die "Cannot read $file: $!";` [download] As long as cat understands large files, Perl understands endless pipes, and this works smoothly.	[reply] [d/l] [select]
Re: open, file handles and memory by mwp (Hermit) on Jan 13, 2001 at 17:42 UTC
As far as I know, that really depends on the operating system and how it handles open file handles. However, most of the time it will buffer the file--read in one line at a time. I wouldn't worry about it. Just don't do something like this: `open(FILE, "somefile") or die $!; my @log = <FILE>; # read entire file into RAM close(FILE); foreach my $line (@log) { # do stuff to $line }` [download] That will definitely clutter up your machine's memory! On the other hand, if you have a gigabyte of RAM and you WANT to load the entire file instead of using slow disk accesses, knock yourself out. =)	[reply] [d/l]
Re: Re: open, file handles and memory by Viking (Beadle) on Jan 13, 2001 at 17:58 UTC
That's what I want to avoid, speed isn't an issue (well within reason), having my server grind to a halt because of it running out of memory would be!! :)	[reply]
Re: open, file handles and memory by repson (Chaplain) on Jan 13, 2001 at 18:04 UTC
`while (<FILE>)` is fine, just make sure it's not `for (<FILE>)` which _would_ load the file into memory, but otherwise seem mostly identical to `while`. It's unlikly to be significant for parsing logfiles, but sometimes it's not optimal to go line by line, since this increases i/o actions, which in general are slow. On the other hand slurping the WHOLE file into memory will require lots of ram. You can strike a compromise by using reads of maybe half a megabyte (I don't know whats optimal, but that seems sensiblish to me :) you can process the file in larger chunks with more infrequent disk access, so a good change of a faster runtime. However this will make it marginally more complicated to write.	[reply] [d/l] [select]
Re (tilly) 2: open, file handles and memory by tilly (Archbishop) on Jan 13, 2001 at 19:51 UTC
Your idea of doing your own buffering can be a lot faster or a lot slower than leaving it up to Perl. It is always going to be a lot harder to get right. See File reading efficiency and other surly remarks, particularly the various discussions between lhoward, tye, and myself.	[reply]
Re:ading lines by orkysoft (Friar) on Jan 13, 2001 at 19:07 UTC
When you instruct the Perl interpreter to read the next line of the file, it can't know in advance how long the line will be, so it's logical that it itself reads larger chunks at a time, and buffers the input, just like the OS does (YMMV).	[reply]
Re: Re:ading lines by Fastolfe (Vicar) on Jan 13, 2001 at 21:13 UTC
Performing an 'strace' on this code: `open(F, "<README"); # arbitrary $\|=1; while(<F>) { print "line $.\n"; }` [download] Results in this: `open("README", O_RDONLY\|O_LARGEFILE) = 3 read(3, "\t\t GNU GENERAL PUBLIC LICENSE"..., 4096) = 4096 # firs +t block write(1, "line 1\n", 7line 1 write(1, "line 2\n", 7line 2 ... write(1, "line 80\n", 8line 80 read(3, " and appropriately publish on ea"..., 4096) = 4096 # seco +nd block write(1, "line 81\n", 8line 81` [download] So it does appear to buffer the data in chunks, but they seem to be managably sized. This too may differ depending upon OS or build of Perl.	[reply] [d/l] [select]