I have a variable, that sometimes contains a valid utf8 string, that looks correctly on the stdout and sometimes, it contains a string of bytes, which looks gibberish on the stdout.
The thing is, there is only one type of string in Perl.
How can I automatically determine, if the string is bytes and only then do the decode command?
You can't (in general case). Which may or may not be an XY problem.

So, what are you actually trying to do?

I have a subroutine, I've wrote for outputting stuff to stdout. I do not use print directly, because my subroutine handles everything automatically, so that I can use one program to use in CGI, terminal STDOUT and GUI without rewriting. I need a way in that subroutine to detect, if the variable, that it recieved is utf8 or a byte string.
Why do you need to detect that? Is that because "conversion to utf8 can brake some filenames"? Perhaps that's not really a problem, just let decode blow up and catch the error (with eval). Something like that:
use strict; use warnings; use Encode; my $enc_flags = Encode::FB_CROAK | Encode::LEAVE_SRC; binmode STDOUT, ':encoding(utf-8)'; while ( my $line = <> ) { chomp $line; my $decoded = eval { Encode::decode( 'utf-8', $line, $enc_flags ); } || bad_string( $line ); print $decoded, "\n"; } sub bad_string { # "upgrade" the string Encode::decode( 'latin-1', shift ); }

In reply to Re: utf8 char or binary string detection by Anonymous Monk
in thread utf8 char or binary string detection by igoryonya

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.