MultiByte Char handling in perl

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

I have a problem here where I need to handle a multibyte character ( unicode + utf8 or other encodings ) . The problem I am facing is when I run my function like .

my changeEncoding 
{
  my $self = shift;
  my $stringToEncode = shift;
  my $EncodedString = Encode::utf8_encode{'$stringToEncode'};

   print "   $EncodedString  "; <-- absolutely fine here .
  return $EncodedString;
}

sub getEncodedString
{
 my $encodedString = changeEncoding(MultiByteChar);

 print $encodedString ; <--This is garbage here.
}
[download]

Why am I seeing garbage in getEncodedString() whereas changeEncoding() shows correct value. I think this has something to do with the way perl stores the variable during function return.

Comment on MultiByte Char handling in perl Download Code

Replies are listed 'Best First'.
Re: MultiByte Char handling in perl by Eliya (Vicar) on Mar 07, 2012 at 17:09 UTC
It's hard to believe the one statement prints "absolutely fine", while the other prints garbage. As it is, your code doesn't even compile. There are several typos, there is no Encode::utf8_encode(), you're calling your OO method as a regular function, you haven't specified what exactly `MultiByteChar` is, etc. The following code works fine, in the sense that both prints output the same sequence of octets for `$[Ee]ncodedString`: `#!/usr/bin/perl -w use strict; use Encode; sub changeEncoding { my $stringToEncode = shift; my $EncodedString = Encode::encode("utf8", $stringToEncode); print " $EncodedString "; return $EncodedString; } sub getEncodedString { my $encodedString = changeEncoding("\x{2345}"); print $encodedString ; } getEncodedString();` [download] (though `changeEncoding` is a misnomer, as it doesn't really change the encoding — rather, it simply encodes a string from Perl's internal (decoded) unicode form)	[reply] [d/l] [select]
Re^2: MultiByte Char handling in perl by Anonymous Monk on Mar 07, 2012 at 17:32 UTC
Thanks for the reply. Please don't worry about the syntax and petty things there. That was to explain. Simply put I want to understand when a function in Perl returns a value, is there a possibility that its encoding gets changed ? WHat i understand is perl must have an inbuilt charset support .How do we change that ? i remember doing this in C by setting lc-ctype	[reply]
Re^3: MultiByte Char handling in perl by Eliya (Vicar) on Mar 07, 2012 at 17:41 UTC
WHat i understand is perl must have an inbuilt charset support .How do we change that ? Perl has built-in support for Unicode (which in itself is not an encoding). You don't want to change that. Rather, to use this functionality, you'd decode your data on input (into Perl's internal form), and encode it on output. With both steps, you can specify an encoding. How Perl represents and handles its internal form is not your business. See perluniintro to start with.	[reply]
Re^3: MultiByte Char handling in perl by Anonymous Monk on Mar 07, 2012 at 17:56 UTC
Here is an example of whats happening. `print "charEncoding string val = " . $enc->($val) . "\n\n"; #prints charEncoding string val = 歡迎來到&#38 +597;虎! my $encoded = $enc->($val); print "\n charEncodingo $encoded == $val " ; #prints charEncodingo æ¡è¿ä¾å°éè& +#142;! == 歡迎來到雅虎!` [download] Can someone throw some light here. Please don't look at syntax errors.	[reply] [d/l]
Re: MultiByte Char handling in perl by Anonymous Monk on Mar 07, 2012 at 15:48 UTC
Use Encode::utf8_encode{ $stringToEncode }; It is a common newcomer mistake to quote arguments to functions single quotes do not interpolate while double quotes do interpolate	[reply]
Re^2: MultiByte Char handling in perl by Anonymous Monk on Mar 07, 2012 at 16:49 UTC
Don't worry about that. That was written on a notepad. I am trying to figure out what is the native character support in perl and how to change.	[reply]