in reply to UTF8/Unicode Confusion

I don't know the specifics of Locale::Currency::Format, but some general comments: in general, Unicode is broken in 5.6.x and fixed in 5.8.x; and in 5.8.x you almost never need 'use utf8'.

Anyway, could you add the following two lines to your code and post the output it produces:

my $symbol = currency_symbol($code, $options); use Devel::Peek; Dump $symbol;

Dave.

Replies are listed 'Best First'.
Re^2: UTF8/Unicode Confusion
by jk2addict (Chaplain) on Mar 20, 2005 at 17:24 UTC

    Assuming I did the right thing...this is without any 'use utf8' or 'utf8::upgrade' magic.

    -------------- 5.6.1 -------------- SV = PV(0x14045dc) at 0x1409e8c REFCNT = 1 FLAGS = (PADBUSY,PADMY,POK,pPOK,UTF8) PV = 0x142d9fc "\302\245"\0 CUR = 2 LEN = 3 -------------- 5.8.4 -------------- SV = PV(0x44c3d64) at 0x10590f4 REFCNT = 1 FLAGS = (PADBUSY,PADMY,POK,pPOK) PV = 0x450ab24 "\245"\0 CUR = 1 LEN = 2

    This is after the uft8:upgrade call:

    ----------------------- 5.8.4 w/utf8::upgrade ----------------------- SV = PV(0x44f91dc) at 0x104d644 REFCNT = 1 FLAGS = (PADBUSY,PADMY,POK,pPOK,UTF8) PV = 0x4518aa4 "\302\245"\0 [UTF8 "\x{a5}"] CUR = 2 LEN = 3
      Well, the Dump outputs show that the function is correctly returning the unicode character 0xa5; it's just that the internal encoding happens not to be utf8. Using utf8::upgrade gets round whatever problem you're having because it converts the internal representation.

      The problem must lie in how you're using the returned value. If for example you're just printing it to STDOUT, and if whatever's listening on STDOUT expects utf8 encoding (eg the terminal), then you need to let Perl know that any output on that file handle should be utf8 encoded, eg

      $ perl -e 'print chr 0xa5'|od -x 0000000 00a5 $ perl -e 'binmode(STDOUT, ":utf8"); print chr 0xa5'|od -x 0000000 a5c2 $
      see perluniintro (in 5.8.x) for more information.

      Dave.

        Well, the Dump outputs show that the function is correctly returning the unicode character 0xa5; it's just that the internal encoding happens not to be utf8

        It does? What am I missing about the second dump, the one from 5.8.4?

        -------------- 5.8.4 -------------- SV = PV(0x44c3d64) at 0x10590f4 REFCNT = 1 FLAGS = (PADBUSY,PADMY,POK,pPOK) PV = 0x450ab24 "\245"\0 CUR = 1 LEN = 2

        That looks like perl is tossing away half of the bytes long before I returns it to any output. I don't think it's a problem with how the output is interpreted, just the fact that the output is half as wide as it should be (5.8.4 tossed away the missing \302)

Re^2: UTF8/Unicode Confusion
by jk2addict (Chaplain) on Mar 20, 2005 at 17:12 UTC

    Using 5.8.4 I assume without the utf8::upgrade line? Or on both 5.6. and 5.8?