in reply to Re^2: nested <FILE> read returns undefined?
in thread nested <FILE> read returns undefined?

Why I did it...I skip ahead (in the second loop), looking for the next record.

I may be missing something (you haven't said what the second loop actually does, so I can't be sure), but couldn't this be better handled with next ?

Regarding the soap box, as rjray pointed out, the important part here is the list context. I don't have the camel on me at the moment, but the behaviour of filehandles in list context is documented in perldoc perlintro as well as perldoc perltrap.


All dogma is stupid.

Replies are listed 'Best First'.
Re^4: nested <FILE> read returns undefined?
by argv (Pilgrim) on Apr 01, 2006 at 06:39 UTC
    You said:

    I may be missing something (you haven't said what the second loop actually does, so I can't be sure), but couldn't this be better handled with next ?

    Imagine a text file that has this format:

    image: path/to/image.jpg attribute1: value attribute2: value [...] attributeN: value ---------- image: path/to/next/image.jpg [...]

    Here, each "record" ends with a line of dashes. My script is reading the file, and when it reads an "image" line, it sees if that image is one that it knows about, and if not, it skips the rest of the record. So, the code looks like:

    while (my $line = <FILE>) { $line =~ /(.*?): (.*?)$/m; ($key, $value) = (lc $1, $2); next if !$key || !$value; if ($key eq "image") { if (!exists $db->{$value}) { # not in the database, so skip this image while ($_ = <FILE>) { last if /^--/; } next; }

    The original code used "for" instead of "while" in each spot above, which is why it wasn't working... and why I posted my original message.... I'm just answering why the code is the way it is because you asked. Sure, I could have done it in a way that allowed me to continue using "for", but changing it to use while() was a better option.

      Thanks for taking the time to explain this. First, there's a bug in your code, because $1 and $2 will not be reset when the while loop starts over, but retain their old values, despite the fact that the match did not succeed (see the warning about this in perldoc perlre). So the line

      next if !$key || !$value;

      will not skip lines that don't contain a key-value pair.

      Here's how I would write this, without using the inner while loop:

      my $curimage; while (my $line = <DATA>) { $line =~ /^([^:]*): (.*?)$/m or next; my ($key, $value) = (lc $1, $2); $curimage = $value if ($key eq "image"); next if(!exists $db->{$curimage}); # ..process key-value pairs }

      This has the obvious disadvantage that you need to look up $db more often, so if $db is not just a hash reference and lookups are more expensive this will be slow. But you can get around that by memoizing $db->{$curimage} and I think the code is clearer this way. YMMV of course.


      All dogma is stupid.
        Your notes were spot-on. Mea Culpa on the $1 and $2 part.

        One caveat to your corrected code was another oversight on my part in my more brief explanation of the rationale for my code segment (which reminded me why I did it the way I did in the first place): each record "may" contain more than one "image: /path..." key-value pair, which is what requires reading ahead to the "----" line before returning to the outside control loop. Your version will just fine the next "image:" key, which would have been the right thing to do if it weren't for that condition.

        Anyway, all's well now.

      I gather you have reached a solution based on the replies already posted, but just for the sake of TMTOWTDI, your data would make a nice case for customizing the INPUT_RECORD_SEPARATOR variable, $/, like so:
      { # customize $/, but only inside this code block: local $/ = "----------\n"; while (<FILE>) { chomp; # this removes $/ from the end my ( $image, @attributes ) = split /\n/; $image = s/^image:\s+(.*)/\L($1)/; next unless ( exists $db->{$image} ); # do something with @attributes... } } # now $/ is back to normal # (updated to add chomp call)
      Note that if you are using modules or reading from other files while doing things with the attributes within that block, the altered value of $/ will be in effect -- you may need to create more block boundaries to localize it again back to its original value.
Re^4: nested <FILE> read returns undefined?
by argv (Pilgrim) on Apr 01, 2006 at 06:30 UTC
    You said:

    the important part here is the list context.

    As usual, retrospect is 20/20, and now I see it. However, it's easy to see how one can be confused in the first place.

    for my $line (<FILE>)

    sure looks like scalar context, because $line isn't an array. This is further underscored by the fact that the loop will assign the next "line" to the variable. Because it feels like <FILE> is being read line-by-line, it's easy to see how for is treating <FILE> like a one-item array upon each iteration. Cap it all off with Duff's comment that perl 6 will "do the right thing" by... well, doing what I just described.

    So, which is the right(tm) way? The @array context that it's currently doing (that you described)? Or the line-at-a-time way that I described, and that will be part of perl6?

    I can see it both ways now, which is why I think the biggest problem is that this very thing isn't spelled out more explicitly in the perl book.

      I don't think there is a right/wrong way of doing things in this case, just the way a particular language (version) does it. Perl 6 is a very different beast from Perl 5 and I'd guess that what duff describes works because Perl 6 is able to lazily evaluate some things that Perl 5 can't (that's just an uninformed guess though and may be wrong).

      Just a note on the documentation though, the Camel is a great book and invaluable when learning and programming Perl. But it is not AFAIK the reference documentation for Perl, that is in the perldocs and you're really missing out if you don't read those.


      All dogma is stupid.
        just to bring closure on this-- I wasn't necessarily trying to argue that there was a "right way" or "wrong way" so much as saying that there is ambiguity about whether for $scalar (@array) would cause <FILE> to be read in its entirety, or line-by-line. there are many cases where perl looks at lvalue to determine what to do on the right side, and this seems like just such a case at first blush.

        $line = <FILE>; # reads one line from FILE; @lines = <FILE>; # slurps up the whole thing. for $line (<FILE>) ... # which of the above applies?

        The thing that really makes the visual assessment ambiguous is the placement of $line in the for statement, which is the source of the perceived ambiguity. Granted, I only saw it as the scalar form, and I do realize that it'd be more like saying

        for $line (@lines)

        which clarifies how perl is reading it (and thus, what happens with <FILE>), but seeing both ways now makes the ambiguity more understandable.