GrandFather has asked for the wisdom of the Perl Monks concerning the following question:

When playing around with some code in answer to reading a line from file two times I found the following odd behaviour:

use warnings; use strict; print "Using Perl $]\n\n"; print "Result using DATA:\n\n"; doRead (*DATA); print "\nResult using file handle:\n\n"; open my $FH, '<', 'noname.pl'; doRead ($FH); close $FH; sub doRead { my $FH = shift; while (! eof $FH) { my $lineStart = tell $FH; my $line = <$FH>; chomp $line; if ($line =~ /^NEWTABLE/) { print "Found at $lineStart: >$line<\n"; seek $FH, $lineStart, 0; $line = <$FH>; chomp $line; print "Reread as >$line<\n"; } } } __DATA__ First line Second line NEWTABLE - third line Fourth line

Prints:

Using Perl 5.008007 Result using DATA: Found at 627: >NEWTABLE - third line< Reread as >< Found at 628: >NEWTABLE - third line< Reread as >< Found at 629: >NEWTABLE - third line< Reread as >NEWTABLE - third line< Result using file handle: Found at 629: >NEWTABLE - third line< Reread as >NEWTABLE - third line<

Notice that using a "conventional" file handle works as expected, but performing seek on DATA generates rather strange results! Can anyone account for this behaviour?

This was run on Windows XP using AS Perl 5.8.7


DWIM is Perl's answer to Gödel

Replies are listed 'Best First'.
Re: An odd thing happend on the way to answering a SoPW - seek DATA oddity (bugs)
by tye (Sage) on Dec 21, 2006 at 05:07 UTC

    This looks to me like someone is foolishly trying to count bytes from the point of view of the program instead of counting bytes actually in the file.

    Looking at the Perl source code (win32.c), I don't see it doing that (perl-5.8.0). But I do see that tell uses ftell() and seek uses fsetpos(). This may be a bug; it is certainly not documented to be something that would work. If fgetpos() were used instead of ftell() [or fseek() instead of fsetpos()], then things might be saner.

    The Win32 SDK says (emphasis added):

    The value returned by ftell() may not reflect the physical byte offset for streams opened in text mode, because text mode causes carriage return–linefeed translation.

    Which makes me think that ftell() might be the thing doing the misguided counting (in an attempt to match the number of bytes seen by the calling code instead of the number of bytes actually in the file). Using fsetpos() instead of seek() might mess up this scheme. But looking at the code for fseek(), I don't see it doing that either.

    So perhaps this misguided code was added in a later version of Perl.

    So, I'm mostly just speculating, not taking the time to plumb direct access to fgetpos() and fseek() in order to write test programs. But there is certainly reason to suspect problems in this code.

    Update: Oh, I realized that I have the source code to fgetpos() and fsetpos() as well. They just call ftell() and fseek() (though this isn't documented). I looked at Perl 5.8.8 source code that it looks mostly the same. All of these calls boil down to SetFilePosition(), which can be accessed directly via Win32API::File, if someone wants to do more exploring.

    Update: Ah! It could be a bug in Perl trying to do seeking within file buffers w/o having to do file I/O (and so the seek/tell code that I was looking at doesn't even get involved). The Win32 source code even has some notes on how big of a mistake this can be. In particular:

    NOTE: We used to bend over backwards to try and preserve the current buffer and maintain disk block alignment. This ended up making our code big and slow and complicated, and slowed us down quite a bit. Some of the things pertinent to the old implimentation:

    (6) CR/LF accounting - When trying to seek within a buffer that is in text mode, we had to go account for CR/LF expansion. This required us to look at every character up to the new offset and see if it was '\n' or not. In addition, we had to check the FCRLF flag to see if the new buffer started with '\n'.

    So I'm not surprised that Perl didn't get this right. (:

    - tye        

Re: An odd thing happend on the way to answering a SoPW - seek DATA oddity
by jettero (Monsignor) on Dec 21, 2006 at 04:12 UTC
    I got the expected result...
    Using Perl 5.008008 Result using DATA: Found at 652: >NEWTABLE - third line< Reread as >NEWTABLE - third line<

    What platform? I find it unlikely it's my 5.8.8 that does it. I'm on 2.6.17-gentoo-r8 i686 AMD Athlon. I fail to see how the platform could matter for *DATA, but here we are. Hrm, better yet... What version of the SelfLoader? I'm on $VERSION = "1.0904". That comes with the perl dist, so I fail to see how that could be it either unless they changed something between 5.8.7 and 5.8.8 ...

    -Paul

Re: An odd thing happend on the way to answering a SoPW - seek DATA oddity
by jdporter (Paladin) on Dec 21, 2006 at 04:19 UTC

    I'm not sure how, but it looks like the end-of-line sequences may have something to do with it, in conjunction with the special nature of the DATA filehandle.
    When I tried it with $/ = \6 (and other necessary adjustments), I observed the expected behavior in both cases.

    We're building the house of the future together.
Re: An odd thing happend on the way to answering a SoPW - seek DATA oddity
by sgt (Deacon) on Dec 21, 2006 at 08:33 UTC

    Annoying bug actually. I often reread DATA (from the start) a few times in tests. I mostly use Perl on HP-UX 11.x and cygwin, so maybe this explains why I did not suffer this before.

    On cygwin 1.5.22-1 with perl 5.8.7 I get the expected result:

    % stephan@armen (/home/stephan) % % perl -w datah_seek_oddity.sh Using Perl 5.008007 Result using DATA: Found at 634: >NEWTABLE - third line< Reread as >NEWTABLE - third line< Result using file handle: Found at 634: >NEWTABLE - third line< Reread as >NEWTABLE - third line<
    hth --stephan