Re^5: UTF8/Unicode Confusion

Perl is not tossing away half the bytes; perl will store characters either as one byte per character (making the character 0x00A5 be represented as "\245" aka "\xa5"), or in utf8 form, with 1-13 bytes per character (with 0x00A5 represented in two characters, "\302\245"). What kind of storage is used is represented by the UTF8 flag, which you will see on after the utf8::upgrade and off prior to it.

If you have an output filehandle that you want to receive only the utf8 encoding, use binmode as suggested above or perl's -C switch (see perlrun).

Comment on Re^5: UTF8/Unicode Confusion

Replies are listed 'Best First'.
Re^6: UTF8/Unicode Confusion by jk2addict (Chaplain) on Mar 21, 2005 at 14:54 UTC
Well, that was my point. I have no control of where the data came from (Locale::Currency::Format), nor where it is going for or how it is outputted (AxKit). With those two facts in hand, I fall back to one of my original questions: it the utf8::upgrade solution an acceptable one?	[reply]
Re^7: UTF8/Unicode Confusion by ysth (Canon) on Mar 21, 2005 at 19:54 UTC
Well, I have no idea what AxKit is, but if you are feeding the data to it it should tell you what encoding it wants. utf8::encode() would be one way to force utf8-encoding, yes, but if you are sending the data via a filehandle, applying a utf8 layer to the filehandle would be better. However, if AxKit is a perl module whose functions you are calling and passing data, it should take your \xa5 whether it is utf8 encoded or not.	[reply]