in reply to An odd thing happend on the way to answering a SoPW - seek DATA oddity

This looks to me like someone is foolishly trying to count bytes from the point of view of the program instead of counting bytes actually in the file.

Looking at the Perl source code (win32.c), I don't see it doing that (perl-5.8.0). But I do see that tell uses ftell() and seek uses fsetpos(). This may be a bug; it is certainly not documented to be something that would work. If fgetpos() were used instead of ftell() [or fseek() instead of fsetpos()], then things might be saner.

The Win32 SDK says (emphasis added):

The value returned by ftell() may not reflect the physical byte offset for streams opened in text mode, because text mode causes carriage return–linefeed translation.

Which makes me think that ftell() might be the thing doing the misguided counting (in an attempt to match the number of bytes seen by the calling code instead of the number of bytes actually in the file). Using fsetpos() instead of seek() might mess up this scheme. But looking at the code for fseek(), I don't see it doing that either.

So perhaps this misguided code was added in a later version of Perl.

So, I'm mostly just speculating, not taking the time to plumb direct access to fgetpos() and fseek() in order to write test programs. But there is certainly reason to suspect problems in this code.

Update: Oh, I realized that I have the source code to fgetpos() and fsetpos() as well. They just call ftell() and fseek() (though this isn't documented). I looked at Perl 5.8.8 source code that it looks mostly the same. All of these calls boil down to SetFilePosition(), which can be accessed directly via Win32API::File, if someone wants to do more exploring.

Update: Ah! It could be a bug in Perl trying to do seeking within file buffers w/o having to do file I/O (and so the seek/tell code that I was looking at doesn't even get involved). The Win32 source code even has some notes on how big of a mistake this can be. In particular:

NOTE: We used to bend over backwards to try and preserve the current buffer and maintain disk block alignment. This ended up making our code big and slow and complicated, and slowed us down quite a bit. Some of the things pertinent to the old implimentation:

(6) CR/LF accounting - When trying to seek within a buffer that is in text mode, we had to go account for CR/LF expansion. This required us to look at every character up to the new offset and see if it was '\n' or not. In addition, we had to check the FCRLF flag to see if the new buffer started with '\n'.

So I'm not surprised that Perl didn't get this right. (:

- tye        

  • Comment on Re: An odd thing happend on the way to answering a SoPW - seek DATA oddity (bugs)