in reply to Re: How can I tell if a string contains binary data or plain-old text?
in thread How can I tell if a string contains binary data or plain-old text?

Slightly better than excluding characters over 127, is excluding characters from 1 to 31 inclusive, since those aren't used in any single byte, 8 bit encodings. They also aren't used as the first bytes in the variable length encodings, although this requires parsing the symbols to figure out which are the first bytes.
Of course a few control characters will occur legitimately in text strings (e.g. EOF), but the percentage will be tiny compared to the ~12.5% you expect in most binaries.
  • Comment on Re: Re: How can I tell if a string contains binary data or plain-old text?

Replies are listed 'Best First'.
Re: How can I tell if a string contains binary data or plain-old text?
by Abigail-II (Bishop) on Oct 31, 2003 at 10:34 UTC
    There's no EOF character in the ASCII set. There might be some filesystems that require files to use a particular character to signal the end of a file (for instance, the SUB (aka ^Z) character has been used), but most modern filesystems record the size of the file as meta data (often called inodes) and don't need a certain character to be present.

    However, some characters in the range 00-1F are found in text files: carriage returns (^M), line feeds (^J), tabs (^I), bells (^G), form feeds (^L) and backspaces (^H). Theoretically, one could find vertical tabs (^K) in text files as well, but I've never knowingly encountered such a thing in a text file.

    Abigail