Gulliver has asked for the wisdom of the Perl Monks concerning the following question:

This is confusing: What I'm seeing is that after the __DATA__ statement the tell function counts '\n' as a single character but before __DATA__ the newline counts as two. If seek() followed the same rule then everything would be ok but it doesn't seem to. When I seek and print within DATA it prints partial lines as can be seen in the output below.

#!/usr/bin/perl use strict; use warnings; my @DATA_FH_ARR = (tell DATA); my $i=1; while (<DATA>){ if (/__DATA__/) { $DATA_FH_ARR[$i++]=tell DATA; } print; } print "\n\@DAT_FH_ARR: @DATA_FH_ARR\n"; my $line; seek DATA, $DATA_FH_ARR[2], 0; $line= <DATA>; print '1:', $line; $line= <DATA>; print '2:', $line; seek DATA, $DATA_FH_ARR[3], 0; $line= <DATA>; print '1:', $line; $line= <DATA>; print '2:', $line; __DATA__ ab __DATA__ ab __DATA__ ab __DATA__ lotsa junk nothing

In the array of DATA file positions the offset was 12 between each. When I added a newline before the third __DATA__ statement the offset changed to 13. But when I added a newline in the code section above the first __DATA__ the offsets shifted by 2. I was expecting the same in both places.

Update: The offsets I'm referring to in the previous paragraph are the offsets shown in the program output. 468 480 493 505 are the offset positions from the array that was created from "tell DATA". By taking the differences between sucessive numbers I could see that adding a newline changed the difference from 12 to 13. 493-480=13; 480-468=12

Here is the output that shows the offsets of 12,13,12

ab __DATA__ ab __DATA__ ab __DATA__ lotsa junk nothing @DAT_FH_ARR: 468 480 493 505 1:A__ 2:ab 1:ATA__ 2:lotsa junk

Here is after I added a newline in the main code. All offsets shifted by 2.

ab __DATA__ ab __DATA__ ab __DATA__ lotsa junk nothing @DAT_FH_ARR: 470 482 495 507 1:A__ 2:ab 1:ATA__ 2:lotsa junk

This is on Win XP with Strawberry Perl 5.12.0.

Replies are listed 'Best First'.
Re: Unexpected result using tell/seek within the __DATA__ file
by ikegami (Patriarch) on Mar 11, 2011 at 18:02 UTC
    You must have a mixture of Windows line endings (CR LF) and unix line endings (LF). On Windows, unless you tell Perl otherwise, CRLF is automatically converted to LF on read and vice-versa on write.

      Do you mean that the text editor could be doing this? I just opened the file in a hex editor and it shows 0x0D 0x0A for every line.

      Why would Perl convert differently after the __DATA__ for tell() but not for seek()?

        Perl reads the source code using readline and not using binmode. This means that tell needs to guess at what the difference is between where perl stopped reading stuff into the buffer and where your script stopped reading stuff out of the buffer.

        So perl gets the accurate seek position of the end of the buffered data and then subtracts the number of bytes in the buffer. For every "\n" in the buffer, there were two bytes ("\r\n") in the file so the tell result is "off" by that many bytes.

        - tye        

        I just opened the file in a hex editor and it shows 0x0D 0x0A for every line.

        Yet you said that adding a blank line only changed the offset by one. These two statements are contradictory.

        The positions seek returns are the same positions you see in your hex editor.

Re: Unexpected result using tell/seek within the __DATA__ file
by Gulliver (Monk) on Mar 17, 2011 at 22:29 UTC

    I turned this in to perlbug. It has been resolved by changing the default build options for Perl on Windows platforms to read source files in text mode. This will apparently cause problems with ByteLoader but that is no longer used anyway.