I'm using it to debug a problem.
I recommend Devel::Peek's Dump.
The corruption is consistent with bytes with the high-bit set getting converted to their UTF8 encoding.
Way too many XS modules access a string's buffer with no regard to the setting of the UTF8 flag. Sounds like it's happening here.
| string | possible internal representations | default typemap to char* |
|---|---|---|
| "\x{C9}" | C9; UTF8=0 | C9 |
| C3,89; UTF8=1 | C3,89 |
I suspect using utf::downgrade is the solution, but I want to verify that.
It is indeed the solution. utf8::downgrade converts the internal encoding of a string from utf8 to bytes if it isn't already. The XS module should do that for you, but you can do it for the module.
| string | possible internal representations | downgraded | default typemap to char* |
|---|---|---|---|
| "\x{C9}" | C9; UTF8=0 | C9; UTF8=0 | C9 |
| C3,89; UTF8=1 | C9; UTF8=0 | C9 |
In reply to Re^5: good way to implement utf8::is_utf8 for perl 5.8.0
by ikegami
in thread good way to implement utf8::is_utf8 for perl 5.8.0
by perl5ever
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |