in reply to Re^4: good way to implement utf8::is_utf8 for perl 5.8.0
in thread good way to implement utf8::is_utf8 for perl 5.8.0
I'm using it to debug a problem.
I recommend Devel::Peek's Dump.
The corruption is consistent with bytes with the high-bit set getting converted to their UTF8 encoding.
Way too many XS modules access a string's buffer with no regard to the setting of the UTF8 flag. Sounds like it's happening here.
| string | possible internal representations | default typemap to char* |
|---|---|---|
| "\x{C9}" | C9; UTF8=0 | C9 |
| C3,89; UTF8=1 | C3,89 |
I suspect using utf::downgrade is the solution, but I want to verify that.
It is indeed the solution. utf8::downgrade converts the internal encoding of a string from utf8 to bytes if it isn't already. The XS module should do that for you, but you can do it for the module.
| string | possible internal representations | downgraded | default typemap to char* |
|---|---|---|---|
| "\x{C9}" | C9; UTF8=0 | C9; UTF8=0 | C9 |
| C3,89; UTF8=1 | C9; UTF8=0 | C9 |
|
|---|