You can't do it yourself easily, though there are tricks. If you know that it's either Unicode OR a JPEG, you can look for the JPEG header, and rule JPEG out if the header isn't found. Or if you're limiting the text to standard ASCII, you can probably be pretty certain it's text if each byte's value is 127 or less. But that gets blown away if your text is 8-bit MIME or Unicode, or if you're looking at a UUEncoded file, which is a non-text entity encoded into 7-bit text-only characters for the purpose of easy SMTP transportability. A zipped or tarred file might look like binary data on the surface, but could contain a text file within. A UUEncoded file will look like text on the outside but may contain binary data within. Just like a JPEG looks like binary data on the outside and yet represents an image within.

The problem is that the more varients of "plain old text" you consider to be plain old text, the more difficult it becomes to distinguish it from non-text.

That being the case, you can guess based on various criteria.


Dave


"If I had my life to live over again, I'd be a plumber." -- Albert Einstein

In reply to Re: How can I tell if a string contains binary data or plain-old text? by davido
in thread How can I tell if a string contains binary data or plain-old text? by Anonymous Monk

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.