is there any way we can check if string needs decoding in utf-8.
Put more clearly, you are asking if there's a way to tell whether a string has already been decoded, or whether it's still encoded using UTF-8.
There's no way to tell for sure.
You could use something like the following:
sub is_valid_utf8 { return $_[0] =~ / ^ (?: [\x00-\x7F] | [\xC0-\xDF] [\x80-\xBF] | [\xE0-\xEF] [\x80-\xBF]{2} | [\xF0-\xF7] [\x80-\xBF]{3} )*+ \z /x } utf8::decode($string) if is_valid_utf8($string);
This simplifies to
utf8::decode($string);
This won't always work. Certain decoded strings are valid UTF-8. That said, these strings would likely be nonsense. I think the conditions I listed here would apply. So the above is actually quite reliable.
In reply to Re: How to avoid decoding string to utf-8.
by ikegami
in thread How to avoid decoding string to utf-8.
by Anonymous Monk
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |