However, the string I get back is always flagged as "native/raw bytes".
Yes, I can confirm (code below), although only if the original string really is ASCII. Update: To be more clear: I can only confirm this in the case the original string is ASCII; otherwise the UTF-8 flag remains enabled. In the case of an ASCII string, I don't see how it not being flagged as UTF-8 causes problems? /Update
In any case, changing the UTF-8/native flag creates problems in my code later on the line.
Perhaps this is the issue we should look at - could you show an SSCCE of how a plain ASCII string without the UTF-8 flag is causing problems for you?
use warnings;
use 5.026;
use utf8;
use open qw/:std :utf8/;
while (<DATA>) {
chomp;
say "'$_': ", utf8::is_utf8($_)||0;
my $y = substr $_, 2, 2;
say "'$y': ", utf8::is_utf8($y)||0;
}
__DATA__
abcd
€bcd
ab€d
Output:
'abcd': 1 'cd': 0 '€bcd': 1 'cd': 1 'ab€d': 1 '€d': 1
In reply to Re: substr on UTF-8 strings (updated)
by haukex
in thread substr on UTF-8 strings
by rdiez
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |