JykkeDaMan has asked for the wisdom of the Perl Monks concerning the following question:

Hi all.

I'm not familiar with character encodings, so I need some help. I've a string like this:

"_label:%C3%84%C3%A4kk%C3%B6si%C3%A4+t%C3%A4ss%C3%A4 _rangeS:1.1.1.0 _rangeE:1.1.1.2"

String includes some nordic characters like 'Ä', 'ä' and 'ö' (probably not show right here either:) I think it is utf8 encoded and I need to print it to the browser (html).

String should be like this:

"_label:Ääkkösiä tässä _rangeS:1.1.1.0 _rangeE:1.1.1.2"

I'm working with Perl 5.8.4 under Solaris Sparc. Source of the string is an XML search.

Thanks,
Jykke Da'Man.

Replies are listed 'Best First'.
Re: UTF8 and printing to HTML
by graff (Chancellor) on Feb 08, 2005 at 23:47 UTC
    You need to convert the "%HH" strings to their binary byte values, then "decode" (actually, just flag) the string as containing utf8 data -- something like this:
    use Encode; # ... # assuming the source string is in $_: s/%([0-9A-F]{2})/chr(hex($1))/egi; # convert "%HH" to binary bytes $_ = decode( 'utf8', $_ ); # flag the string as utf8 data
    (BTW, you seem to be right, the intent of the original text is to convey utf8 characters -- in this case, by converting each byte of each "wide" utf8 character as a sequence of hex byte values prefixed by "%".)