Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

I have a problem here where I need to handle a multibyte character ( unicode + utf8 or other encodings ) . The problem I am facing is when I run my function like .
my changeEncoding { my $self = shift; my $stringToEncode = shift; my $EncodedString = Encode::utf8_encode{'$stringToEncode'}; print " $EncodedString "; <-- absolutely fine here . return $EncodedString; } sub getEncodedString { my $encodedString = changeEncoding(MultiByteChar); print $encodedString ; <--This is garbage here. }
Why am I seeing garbage in getEncodedString() whereas changeEncoding() shows correct value. I think this has something to do with the way perl stores the variable during function return.

Replies are listed 'Best First'.
Re: MultiByte Char handling in perl
by Eliya (Vicar) on Mar 07, 2012 at 17:09 UTC

    It's hard to believe the one statement prints "absolutely fine", while the other prints garbage.

    As it is, your code doesn't even compile. There are several typos, there is no Encode::utf8_encode(), you're calling your OO method as a regular function, you haven't specified what exactly MultiByteChar is, etc.

    The following code works fine, in the sense that both prints output the same sequence of octets for $[Ee]ncodedString:

    #!/usr/bin/perl -w use strict; use Encode; sub changeEncoding { my $stringToEncode = shift; my $EncodedString = Encode::encode("utf8", $stringToEncode); print " $EncodedString "; return $EncodedString; } sub getEncodedString { my $encodedString = changeEncoding("\x{2345}"); print $encodedString ; } getEncodedString();

    (though changeEncoding is a misnomer, as it doesn't really change the encoding — rather, it simply encodes a string from Perl's internal (decoded) unicode form)

      Thanks for the reply. Please don't worry about the syntax and petty things there. That was to explain. Simply put I want to understand when a function in Perl returns a value, is there a possibility that its encoding gets changed ? WHat i understand is perl must have an inbuilt charset support .How do we change that ? i remember doing this in C by setting lc-ctype
        WHat i understand is perl must have an inbuilt charset support .How do we change that ?

        Perl has built-in support for Unicode (which in itself is not an encoding).  You don't want to change that.  Rather, to use this functionality, you'd decode your data on input (into Perl's internal form), and encode it on output. With both steps, you can specify an encoding.  How Perl represents and handles its internal form is not your business.

        See perluniintro to start with.

        Here is an example of whats happening.
        print "charEncoding string val = " . $enc->($val) . "\n\n"; #prints charEncoding string val = &#27489;&#36814;&#20358;&#21040;&#38 +597;&#34382;! my $encoded = $enc->($val); print "\n charEncodingo $encoded == $val " ; #prints charEncodingo æ­¡è¿&#142;ä¾&#134;å&#136;°é&#155;&#133;è&#153;& +#142;! == &#27489;&#36814;&#20358;&#21040;&#38597;&#34382;!
        Can someone throw some light here. Please don't look at syntax errors.
Re: MultiByte Char handling in perl
by Anonymous Monk on Mar 07, 2012 at 15:48 UTC

    Use

    Encode::utf8_encode{ $stringToEncode };

    It is a common newcomer mistake to quote arguments to functions

    single quotes do not interpolate while double quotes do interpolate

      Don't worry about that. That was written on a notepad. I am trying to figure out what is the native character support in perl and how to change.