in reply to Re: nested <FILE> read returns undefined?
in thread nested <FILE> read returns undefined?

FWIW, I suspected the control loop operator was probably the culprit, but I only changed the SECOND for loop to use while. (In my code, there's another loop. The example I gave was seriously trimmed back to illustrate the point.) It was a fluke that I happened not to bother changing the first control loop, and only the second.

Why I did it is simple: I'm reading a file that contains a formatted data set in ascii. If one of the records I'm reading doesn't start with the correct pattern, I skip ahead (in the second loop), looking for the next record.

I feel I'm pretty familiar with the ORA perl book, and nowhere did I ever see anything on for() reading the entire file into memory. Granted, I'm not saying it's not there. I just haven't seen it, and I checked the index everywhere for references to file handles, reading, and so on. It wouldn't occur to me to check for for.

Scaling up my soap box...
IMHO, whether for reads a whole file or not seems to me to be an artifact of implementation that should not affect the logic of the program. If perl feels it can be more efficient this way, then fine--it can do so--but it's a bug to have the kind of effect it has on programs such as the one I illustrated. Indeed, Duff has even pointed out that Perl 6 will "do the right thing", so I probably needn't even point out that the current behavior should be considered a bug. But my minor flame on the subject is what keeps me from getting into worse trouble elsewhere.

Replies are listed 'Best First'.
Re^3: nested <FILE> read returns undefined?
by rjray (Chaplain) on Mar 31, 2006 at 07:03 UTC

    It's not a matter of for reading the whole file or not. The for expression has a list-context within the parens, and in a list context <> reads the entire file.

    --rjray

Re^3: nested <FILE> read returns undefined?
by tirwhan (Abbot) on Mar 31, 2006 at 08:14 UTC
    Why I did it...I skip ahead (in the second loop), looking for the next record.

    I may be missing something (you haven't said what the second loop actually does, so I can't be sure), but couldn't this be better handled with next ?

    Regarding the soap box, as rjray pointed out, the important part here is the list context. I don't have the camel on me at the moment, but the behaviour of filehandles in list context is documented in perldoc perlintro as well as perldoc perltrap.


    All dogma is stupid.
      You said:

      I may be missing something (you haven't said what the second loop actually does, so I can't be sure), but couldn't this be better handled with next ?

      Imagine a text file that has this format:

      image: path/to/image.jpg attribute1: value attribute2: value [...] attributeN: value ---------- image: path/to/next/image.jpg [...]

      Here, each "record" ends with a line of dashes. My script is reading the file, and when it reads an "image" line, it sees if that image is one that it knows about, and if not, it skips the rest of the record. So, the code looks like:

      while (my $line = <FILE>) { $line =~ /(.*?): (.*?)$/m; ($key, $value) = (lc $1, $2); next if !$key || !$value; if ($key eq "image") { if (!exists $db->{$value}) { # not in the database, so skip this image while ($_ = <FILE>) { last if /^--/; } next; }

      The original code used "for" instead of "while" in each spot above, which is why it wasn't working... and why I posted my original message.... I'm just answering why the code is the way it is because you asked. Sure, I could have done it in a way that allowed me to continue using "for", but changing it to use while() was a better option.

        Thanks for taking the time to explain this. First, there's a bug in your code, because $1 and $2 will not be reset when the while loop starts over, but retain their old values, despite the fact that the match did not succeed (see the warning about this in perldoc perlre). So the line

        next if !$key || !$value;

        will not skip lines that don't contain a key-value pair.

        Here's how I would write this, without using the inner while loop:

        my $curimage; while (my $line = <DATA>) { $line =~ /^([^:]*): (.*?)$/m or next; my ($key, $value) = (lc $1, $2); $curimage = $value if ($key eq "image"); next if(!exists $db->{$curimage}); # ..process key-value pairs }

        This has the obvious disadvantage that you need to look up $db more often, so if $db is not just a hash reference and lookups are more expensive this will be slow. But you can get around that by memoizing $db->{$curimage} and I think the code is clearer this way. YMMV of course.


        All dogma is stupid.
        I gather you have reached a solution based on the replies already posted, but just for the sake of TMTOWTDI, your data would make a nice case for customizing the INPUT_RECORD_SEPARATOR variable, $/, like so:
        { # customize $/, but only inside this code block: local $/ = "----------\n"; while (<FILE>) { chomp; # this removes $/ from the end my ( $image, @attributes ) = split /\n/; $image = s/^image:\s+(.*)/\L($1)/; next unless ( exists $db->{$image} ); # do something with @attributes... } } # now $/ is back to normal # (updated to add chomp call)
        Note that if you are using modules or reading from other files while doing things with the attributes within that block, the altered value of $/ will be in effect -- you may need to create more block boundaries to localize it again back to its original value.
      You said:

      the important part here is the list context.

      As usual, retrospect is 20/20, and now I see it. However, it's easy to see how one can be confused in the first place.

      for my $line (<FILE>)

      sure looks like scalar context, because $line isn't an array. This is further underscored by the fact that the loop will assign the next "line" to the variable. Because it feels like <FILE> is being read line-by-line, it's easy to see how for is treating <FILE> like a one-item array upon each iteration. Cap it all off with Duff's comment that perl 6 will "do the right thing" by... well, doing what I just described.

      So, which is the right(tm) way? The @array context that it's currently doing (that you described)? Or the line-at-a-time way that I described, and that will be part of perl6?

      I can see it both ways now, which is why I think the biggest problem is that this very thing isn't spelled out more explicitly in the perl book.

        I don't think there is a right/wrong way of doing things in this case, just the way a particular language (version) does it. Perl 6 is a very different beast from Perl 5 and I'd guess that what duff describes works because Perl 6 is able to lazily evaluate some things that Perl 5 can't (that's just an uninformed guess though and may be wrong).

        Just a note on the documentation though, the Camel is a great book and invaluable when learning and programming Perl. But it is not AFAIK the reference documentation for Perl, that is in the perldocs and you're really missing out if you don't read those.


        All dogma is stupid.