convert scalar to wide hexadecimal value? how?

merlinX has asked for the wisdom of the Perl Monks concerning the following question:

I want to convert <U+xxxx> literals into wide hexadecimal values, how do I do that? Probably trivial, but I don't see it now...

use Encode;
use encoding 'utf8';

my $test = "<U+010C>";
$test=~s/\<U\+(.*)\>/\\x\{$1\}/g;
print "$test\n"; # prints \x{010C}

my $probe1=encode("utf8","\x{010C}"); # works
my $probe2=encode("utf8","$test"); # does not work?
[download]

Comment on convert scalar to wide hexadecimal value? how? Select or Download Code

Replies are listed 'Best First'.
Re: convert scalar to wide hexadecimal value? how? by JavaFan (Canon) on Jan 14, 2009 at 11:16 UTC
\x{010C} is only a equal to Č if it appears in a string literal. Otherwise, you may want to use 'chr'. Two ways of converting `"<U+010C>"`: `$text =~ s/<U\+([0-9A-Fa-f]+)>/qq!"\\x{$1}"!/ee; $text =~ s/<U\+([0-9A-Fa-f]+)>/chr hex $1/e;` [download]	[reply] [d/l] [select]
Re^2: convert scalar to wide hexadecimal value? how? by merlinX (Novice) on Jan 14, 2009 at 12:11 UTC
Indeed, thanks ... regex is not my strong point anyway, so hence a little additional question, suppose I have multiple `<U+xxxx>` codes in the string like for instance `"TEST <U+010C> <U+0158> <U+0147>"`... how does the regexp look like then?	[reply] [d/l] [select]
Re^3: convert scalar to wide hexadecimal value? how? by merlinX (Novice) on Jan 14, 2009 at 12:43 UTC
Just found it ... I guess I have to add the g of global right? `$text =~ s/<U\+([0-9A-Fa-f]+)>/qq!"\\x{$1}"!/gee;` [download]	[reply] [d/l]
Re^4: convert scalar to wide hexadecimal value? how? by JavaFan (Canon) on Jan 14, 2009 at 12:57 UTC
Re: convert scalar to wide hexadecimal value? how? by moritz (Cardinal) on Jan 14, 2009 at 12:07 UTC
Try `s/<U\+([a-fA-F\d]+)>/chr hex $1/eg`. (In Perl regexes `<` has no special meaning, so you can omit the backslash before it. If you want a word boundary, use `\b` instead.)	[reply] [d/l] [select]
Re^2: convert scalar to wide hexadecimal value? how? by JavaFan (Canon) on Jan 14, 2009 at 13:07 UTC
The pattern of your suggestion is almost the same as my second suggestion - except that you're using \d instead of 0-9. As a result, your solution is going to try change: `"<U+٣٢>"` But the result isn't what you hope for. Unfortunally, \d matches hundreds of characters functions and operators dealing with (hex)numbers cannot deal with.	[reply]