Beefy Boxes and Bandwidth Generously Provided by pair Networks
Welcome to the Monastery

file processing, while and foreach: a n00b experience

by Ryszard (Priest)
on Aug 27, 2003 at 08:04 UTC ( #286970=perlmeditation: print w/replies, xml ) Need Help??

During the course of my day i had to parse and process at 2m+ line text file. No worries i thought:
open(FH, 'someLargeFile.txt') || die "Unable to open file: $!"; foreach (<FH>) { .. #do something }

I thought this would loop thru each line of the file and #do something. I was wrong, and of course i felt like a bit of a n00b.

What i saw happening on my little sparc was the entire file being loaded into memory, then being processed, and because it was so large, it didnt fit, and abended.

So being an "experienced n00b" i tried a couple of things and found out that if i s/foreach/while/ it seems to iterate over everything without loading it all into memory 1st.

When i think about it, it kinda makes sense, in that how can foreach know when to stop, when it doesnt know where the end is, and while its just going until it finds an EOF marker.

Moral of the story, if something is not working as you expected, its probably not designed to, _or_ you can think you've got experience in things, you probably have, but the basics can still jump up and bite.

n00bily yours...

Update: as pointed out by gmax jargon can sometimes be difficult. in this case n00b == newbie == someone new to doing something.

Replies are listed 'Best First'.
•Re: file processing, while and foreach: a n00b experience
by merlyn (Sage) on Aug 27, 2003 at 13:06 UTC
    This is actually such a common confusion that we address it specifically in Learning Perl.

    foreach always takes a list. A filehandle-read in a list context always reads the entire file into memory.

    while wants a boolean, which is a special kind of scalar. The solitary filehandle-read operation is macro-expanded to reading the filehandle in a scalar context (one chunk at a time) into $_, and verifying that we're not yet at end of file.

    So, while they may look similar, and in some ways act similar, they accomplish their task in completely different mechanisms.

    Another thing to notice is the value of $. during the loop. In the foreach case, we've read the entire file already so the value is the number of lines of the file. In the while case, it'll be the current line number.

    -- Randal L. Schwartz, Perl hacker
    Be sure to read my standard disclaimer if this is a reply.

Re: file processing, while and foreach: a n00b experience
by Juerd (Abbot) on Aug 27, 2003 at 09:08 UTC

    It helps to know that

    while (<FH>) {
    is equivalent to
    while (defined($_ = readline *FH)) {
    and that foreach (LIST) flattens the list it gets. (Well, there are some special cases like foreach (ARRAY) and foreach (RANGE) that are more efficient.)

    See the documentation on readline to find out how it behaves differently when used in list context.

    In Perl 6, this will not be much of a problem, since it will have "lazy lists". A normal for (LIST) will do what you mean, in an efficient way.

    Juerd # { site => '', plp_site => '', do_not_use => 'spamtrap' }

      In fact I suspect that, true to the Perl optimization rule of not involving the dispatcher if at all possible, using for in Perl 6 to implictly alias to the topic and letting the lazy behaviour implicitly determine EOFness for you will be more efficient than using while while explicitly assigning to a variable and explicitly testing for definedness. </handwaving>

      Makeshifts last the longest.

      Hopefully, that will be a lazy list automatically in Perl 6 without having to do anything special or even worrying about it fitting in memory (or the file being finite/completed).
Re: file processing, while and foreach: a n00b experience
by broquaint (Abbot) on Aug 27, 2003 at 09:09 UTC
    They're two different things really - foreach takes a list and iterates over the given list, whereas while will keep looping until it's condition expression evaluates to false. In the first instance where you're using foreach, <FH> is evaluated in list context, as foreach evaluates it's provided arguments in list context, which will then read until it reaches an eof and then returns the lines that were read as a list. Whereas when you use a while, perl will auto-magically wrap your filehandle read with a check for definedness e.g
    perl -MO=Deparse -e 'while(<>) { }' while (defined($_ = <ARGV>)) { (); } -e syntax OK
    Then will continue looping until the filehandle reaches an eof, so only ever reads in a line at a time. See. perlsyn for more info on looping.


Re: file processing, while and foreach: a n00b experience
by wirrwarr (Monk) on Aug 27, 2003 at 12:20 UTC
    It should be emphasized that the argument to foreach is evaluated in list context, while the argument to while is evaluated in scalar context. This means that foreach (<FH>) is (roughly) equivalent to
    my @x = <FH>; foreach (@x) { ... }
    And this code does slurp the whole file into memory first.


Log In?

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlmeditation [id://286970]
Approved by gmax
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others taking refuge in the Monastery: (3)
As of 2023-02-06 05:42 GMT
Find Nodes?
    Voting Booth?
    I prefer not to run the latest version of Perl because:

    Results (33 votes). Check out past polls.