Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:
I'd like to test a block of bytes for whether it appears to be "real" text or not. By that I mean, would a human judge it to be text, or "binary"? Conceptually, similar to what the -T/-B operators do, but without the heuristic part: it's either 100% Text, for sure, or else we call it Binary.
I'm pretty sure I could cobble together a regex that does what I want; but I thought there would be a character class (or at most two) which would Do The Right Thing. Unfortunately, there doesn't seem to be. IsAscii is too broad, as it covers a lot of "control" characters that we don't normally think of as being in text, particularly \000. IsPrint is too narrow, as it doesn't even cover <tab>.
Thanks in advance...
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: How to match "text"?
by BrowserUk (Patriarch) on Jun 14, 2013 at 12:40 UTC | |
|
Re: How to match "text"? (bytes)
by Anonymous Monk on Jun 14, 2013 at 12:32 UTC | |
|
Re: How to match "text"?
by Anonymous Monk on Jun 14, 2013 at 13:11 UTC | |
by BrowserUk (Patriarch) on Jun 14, 2013 at 13:12 UTC | |
by Anonymous Monk on Jun 14, 2013 at 13:43 UTC |