jk2addict has asked for the wisdom of the Perl Monks concerning the following question:

I'm tinkering around with the currency formatting options in AxKit::XSP::Currency and Handel and I'm banging my head against the wall trying to figure out the 'proper' way do deal with the 'utf' currency symbols returned by Locale::Currency::Format.

Long story longer, the L::C::F::currency_symbol method can return the UTF version of the currency symbol. I have a method in the AxKit XSP taglibs for the modules above that simply returns the value returned from that method.

sub symbol { my ($code, $options) = @_; $code ||= 'USD'; $options ||= 'SYM_UFT'; eval '$options = ' . $options; return currency_symbol($code, $options); };

With everything declared as UTF-8, it returned ? instead of the symbol. So, after some tinkering, here's what I've learned.

First, in 5.6.1, everything works fine without the need to 'use utf8'. The currency symbol is displayed just fine.

Next, in 5.8.4, the symbol is always displayed as ?. Adding 'use utf8' to the module made no difference. This code however, fixes the problem:

sub symbol { my ($code, $options) = @_; $code ||= 'USD'; $options ||= 'SYM_UFT'; eval '$options = ' . $options; my $symbol = currency_symbol($code, $options); utf8::upgrade($symbol); return $symbol; };

I know a good part of this is blurred by what AxKit does and how it handles utf, but does this seem right to anyone? Are other people having to jump through hoops just to get 5.8.x act like 5.6. used to when it comes to just outputing utf data?

9 times out of 10, stuff just works between 56 and 58 for me. Is this one of those times where they're not the same and I don't understand what in the heck I'm doing? :-)

Replies are listed 'Best First'.
Re: UTF8/Unicode Confusion
by dave_the_m (Monsignor) on Mar 20, 2005 at 17:05 UTC
    I don't know the specifics of Locale::Currency::Format, but some general comments: in general, Unicode is broken in 5.6.x and fixed in 5.8.x; and in 5.8.x you almost never need 'use utf8'.

    Anyway, could you add the following two lines to your code and post the output it produces:

    my $symbol = currency_symbol($code, $options); use Devel::Peek; Dump $symbol;

    Dave.

      Assuming I did the right thing...this is without any 'use utf8' or 'utf8::upgrade' magic.

      -------------- 5.6.1 -------------- SV = PV(0x14045dc) at 0x1409e8c REFCNT = 1 FLAGS = (PADBUSY,PADMY,POK,pPOK,UTF8) PV = 0x142d9fc "\302\245"\0 CUR = 2 LEN = 3 -------------- 5.8.4 -------------- SV = PV(0x44c3d64) at 0x10590f4 REFCNT = 1 FLAGS = (PADBUSY,PADMY,POK,pPOK) PV = 0x450ab24 "\245"\0 CUR = 1 LEN = 2

      This is after the uft8:upgrade call:

      ----------------------- 5.8.4 w/utf8::upgrade ----------------------- SV = PV(0x44f91dc) at 0x104d644 REFCNT = 1 FLAGS = (PADBUSY,PADMY,POK,pPOK,UTF8) PV = 0x4518aa4 "\302\245"\0 [UTF8 "\x{a5}"] CUR = 2 LEN = 3
        Well, the Dump outputs show that the function is correctly returning the unicode character 0xa5; it's just that the internal encoding happens not to be utf8. Using utf8::upgrade gets round whatever problem you're having because it converts the internal representation.

        The problem must lie in how you're using the returned value. If for example you're just printing it to STDOUT, and if whatever's listening on STDOUT expects utf8 encoding (eg the terminal), then you need to let Perl know that any output on that file handle should be utf8 encoded, eg

        $ perl -e 'print chr 0xa5'|od -x 0000000 00a5 $ perl -e 'binmode(STDOUT, ":utf8"); print chr 0xa5'|od -x 0000000 a5c2 $
        see perluniintro (in 5.8.x) for more information.

        Dave.

      Using 5.8.4 I assume without the utf8::upgrade line? Or on both 5.6. and 5.8?

Re: UTF8/Unicode Confusion
by jk2addict (Chaplain) on Mar 20, 2005 at 23:17 UTC

    Here's my guess: Byte-and-Character-Semantics

    "However, as an interim compatibility measure, Perl aims to provide a safe migration path from byte semantics to character semantics for programs. For operations where Perl can unambiguously decide that the input data are characters, Perl switches to character semantics. For operations where this determination cannot be made without additional information from the user, Perl decides in favor of compatibility and chooses to use byte semantics."

    So it's can't guess well in 5.8 with \x{}, su I have to give it the hint.