in reply to Unexpected result using tell/seek within the __DATA__ file

You must have a mixture of Windows line endings (CR LF) and unix line endings (LF). On Windows, unless you tell Perl otherwise, CRLF is automatically converted to LF on read and vice-versa on write.
  • Comment on Re: Unexpected result using tell/seek within the __DATA__ file

Replies are listed 'Best First'.
Re^2: Unexpected result using tell/seek within the __DATA__ file
by Gulliver (Monk) on Mar 11, 2011 at 18:16 UTC

    Do you mean that the text editor could be doing this? I just opened the file in a hex editor and it shows 0x0D 0x0A for every line.

    Why would Perl convert differently after the __DATA__ for tell() but not for seek()?

      Perl reads the source code using readline and not using binmode. This means that tell needs to guess at what the difference is between where perl stopped reading stuff into the buffer and where your script stopped reading stuff out of the buffer.

      So perl gets the accurate seek position of the end of the buffered data and then subtracts the number of bytes in the buffer. For every "\n" in the buffer, there were two bytes ("\r\n") in the file so the tell result is "off" by that many bytes.

      - tye        

        I have used tell/seek lots of times to get back to the beginning of DATA and have never had a problem. This is the first time I have seen this and it only seems to be off when I seek within the DATA section. Then it is off by 1 per newline.

        If what you are saying is true, shouldn't it be off regardless of the start of DATA?

        Thanks! I understand now what you mean. The first "tell DATA" is just giving the value left there by Perl which is correct. Subsequent calls to "tell DATA" are calculated and are off in Windows. I'll check tonight that this works on Linux.

      I just opened the file in a hex editor and it shows 0x0D 0x0A for every line.

      Yet you said that adding a blank line only changed the offset by one. These two statements are contradictory.

      The positions seek returns are the same positions you see in your hex editor.

        You took that out of context. There is no contradiction. You were the one who implied the program file had mixed newline types. I only got out the hex editor to rule out the possibility. All the newlines in the program file have 2 bytes as I stated already.

        The offsets I'm referring to in the original post are the offsets shown in the program output. 468 480 493 505 are the offset positions from the array that was created from "tell DATA". By taking the differences between sucessive numbers I could see that adding a newline in the DATA section changed the difference from 12 to 13. 493-480=13; 480-468=12

        Adding a new line in the Main Code caused all offsets displayed in the program output to be shifted by 2.

        Update: I just got the hex editor out again, all the dust has been brushed off now. The positions of the next character after each of the __DATA__ lines is as follows:

        from Hex edit From program output in OP 0x1D6 => 470 # 470 (the same) 0x1E4 => 484 # 482 different, just like I said. 0x1F4 => 500 # 495 0x202 => 514 # 507

        It confirms what I've been saying all along.