Rather than binmode text files, you should instead learn that "file size" only equals "number of bytes when the file is read into memory" when the file is a simple stream of bytes. Although that is very common on Unix, it is nearly uncommon outside of Unix.

For another example where this isn't true, consider Unix directories. They are files but they don't simply contain a stream of bytes and the "size" won't match the number of bytes you get back when you "read" them (some Unix systems will let you read a directory as a stream of bytes, but that isn't what you are supposed to do with them).

A great many types of systems don't routinely store files as simple streams of bytes (and even some that support that won't report file size to match your expectations).

It is quite common to have files recorded as a series of records. And record separators can have a length of 0 (for fixed-length records, for example) or a longer length (such as preceeding the record by the length of the record) or even a variable length (such as when records are indexed). Now, Unix takes a minimalist approach (which I think turned out to be a really good idea) and implements any of the above schemes on top of the file system's idea that all files are simply a stream of bytes. So when you read an ordinary file on Unix, you just get that same stream of bytes.

But these other systems track record boundaries "outside" of the data of the file (which allows you to put a "\n" inside your record, which probably doesn't seem like a big deal to you since you've spent your entire computing lifespan thinking about files as streams of bytes). This file meta data may or may not be included in the "size" that -s gives back to you. Whether it does or not is really a matter convenience/efficiency.

Even non-oridinary files on Unix don't stores simple streams of bytes.

In Unix, the file isn't actually stored as a stream of bytes. It is probably stored as a bunch sectors thrown willy-nilly about the disk. But the Unix file system presents these to the program/programmer as a stream of bytes. So even when a Unix file has a chunk missing from the middle that is not recorded to disk, Unix zero-fills these when it is read and also shows the "file size" as the number of bytes that you'd have after this has been done so your comparison still succeeds in this case.

So please, just stop comparing "number of bytes read" to what -s says. It isn't portable. Even if you use binmode, you'll run into (somewhat rare) cases where this doesn't work. Even when you have an ordinary file on Unix, there are race conditions to consider.

binmode on text files is usually a bad idea. Comparing -s to number of bytes read is always a bad idea in my book.

                - tye

In reply to Re: Remember to binmode text files (wrong test/conclusion) by tye
in thread Remember to binmode text files by diotalevi

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.