in reply to Unexpected utf8 in hash keys

It's definitely worthwhile to know and understand the trickiness demonstrated so clearly by ikegami's various tests, and I would agree that some of his results point to "actionable" inconsistencies that should probably be treated as bugs. BUT... when you say:

It cost us a lot of blood and sweat to debug why some perfectly ASCII strings would suddenly get the flag.

Does this mean you were using the utf8 flag to determine whether or not a string contains wide characters? That is not what the flag is for, and you shouldn't be using it that way. To test for wide characters in a string, use a regex:

if ( /[^[:ascii:]]/ ) { ... } # which is equivalent to if ( /[^\x00-\x7f]/ ) { ... }
The purpose of the utf8 flag, as I understand it, is to answer the question: if there happen to be non-ASCII bytes in this string, are they to be interpreted as utf8 characters, or not? The treatment of an all-ASCII string should be the same regardless of whether the utf8 flag is set.

Replies are listed 'Best First'.
Re^2: Unexpected utf8 in hash keys
by kappa (Chaplain) on Feb 20, 2008 at 15:06 UTC
    Yes, it was a wrong way to do that -- an attempt based on wrong guess that perl would not set utf8 flag on ASCII strings.
    --kap