in reply to Re: Character Conversion Conundrum
in thread Character Conversion Conundrum

Thanks for the code. is_utf8 does return 1 in this case.

Good question on the encoding my console uses. I'll see if I can find out.

Thanks,

SheridanCat

Replies are listed 'Best First'.
Re^3: Character Conversion Conundrum
by Aristotle (Chancellor) on Dec 23, 2004 at 19:17 UTC

    Now that is weird. There's a 0xF3 in there, but the UTF-8 flag is on? 0xF3 0x6E is not a valid UTF-8 sequence. 0xF3 indicates the start of a four-byte wide character (four highest bits set, then a zero bit to terminate the sequence, and 3 bits of payload), but 0x6E means this character it's not part of a sequence (highest bit is zero). That's invalid.

    So the input never actually gets converted to UTF-8, but someone is still flipping the UTF-8 flag on it. And Perl does not complain when printing the string. Weird. Seems like something is rather amiss there. Whether that is the cause for the less-than character you're seeing on the console for some reason is anyone's guess. Assuming these are somewhat older versions of Perl and XML::Simple, maybe you ought to check whether newer ones act consistently.

    I don't really have any suggestions, I'm afraid, I'm kind of at a loss.

    Makeshifts last the longest.

      Thanks for the the help nonetheless. I haven't figured out the problem yet. Just to make this complete, I'm using ActivePerl 5.8.3 and XML::Simple is 2.09, so I seem to be up-to-date.

      Very strange. I'll keep plugging away.

      Thanks

      SheridanCat