If your file is in text mode, a newline is two bytes on windows, but one byte on linux; it's one character in either case.

No. "\r\n" is two characters. And length("\r\n") is indeed 2, even in Perl, even on Windows. And bytes::length("\n") is indeed 1, even in Perl, even on Windows.

What binmode (usually) does is prevent read operations on Windows from converting the two character string "\r\n" (from the file) into the single character "\n" when storing the results in your Perl string.

Unix actually does very similar things, it is just that these changes are done at the devices boundary rather than at the file system boundary. Sending "\n" to a device like a TTY in Unix usually causes the two characters "\r\n" to be sent to the device instead (just like writing the single character "\n" to an ordinary file in Windows).

But you are correct in suggesting that mixing seek and length can run into such problems. You should indeed use binmode and bytes::length() when figuring out where to seek. But the reasons for that have nothing to do with "\n" being a single character of two bytes on any platform that I am aware of.

Of course, doing that isn't sufficient to make such a use of seek actually fully portable. The only fully portable way to use seek with a non-zero offset is to feed it a value you previously got from tell. For example, using seek with non-zero offsets on VMS can be quite surprising, depending on the type of file involved (VMS's file system layer is called RMS for Record Management System and most files are not streams of bytes but streams of records where byte offsets are hard to interpret). But you can get away with seeking by arbitrary byte offsets when dealing with ordinary Unix and Windows file systems.

- tye        


In reply to Re^2: different length of a line from linux and windows textfile? (seek) by tye
in thread different length of a line from linux and windows textfile? by Microcebus

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.