I have a variable, that sometimes contains a valid utf8 string, that looks correctly on the stdout and sometimes, it contains a string of bytes, which looks gibberish on the stdout.The thing is, there is only one type of string in Perl.
How can I automatically determine, if the string is bytes and only then do the decode command?You can't (in general case). Which may or may not be an XY problem.
So, what are you actually trying to do?
I have a subroutine, I've wrote for outputting stuff to stdout. I do not use print directly, because my subroutine handles everything automatically, so that I can use one program to use in CGI, terminal STDOUT and GUI without rewriting. I need a way in that subroutine to detect, if the variable, that it recieved is utf8 or a byte string.Why do you need to detect that? Is that because "conversion to utf8 can brake some filenames"? Perhaps that's not really a problem, just let decode blow up and catch the error (with eval). Something like that:
use strict; use warnings; use Encode; my $enc_flags = Encode::FB_CROAK | Encode::LEAVE_SRC; binmode STDOUT, ':encoding(utf-8)'; while ( my $line = <> ) { chomp $line; my $decoded = eval { Encode::decode( 'utf-8', $line, $enc_flags ); } || bad_string( $line ); print $decoded, "\n"; } sub bad_string { # "upgrade" the string Encode::decode( 'latin-1', shift ); }
In reply to Re: utf8 char or binary string detection
by Anonymous Monk
in thread utf8 char or binary string detection
by igoryonya
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |