Szani has asked for the wisdom of the Perl Monks concerning the following question:

I would like ask about encoding in XML::Parser
here is simple code
my $p= XML::Parser->new( Style => 'Tree' , ProtocolEncoding => 'windows-1250' ); my $XmlDok=$p->parse( '<RAP>ó</RAP>' ); my $xso = XML::SimpleObject->new( $XmlDok ); my $buff = $xso->child('RAP')->value;
How to convert $buff back to windows-1250 ?
On linux I just convert it using:
my $Map= new Unicode::Map("CP1250"); $buff= $Map->from_unicode( Unicode::Transform::unicode_to_utf16be( $ +buff ) );
but the same conversion doesn't work under Windows !!!!!
I'm a bit confused ?
There are any simplier way to achive conversion back from XML::Parser and working well under linux and windows

Edited by planetscape - added code tags

Replies are listed 'Best First'.
Re: XML::Parser & encoding
by grantm (Parson) on Nov 30, 2005 at 23:57 UTC
    1. There are very, very few legitimate reasons to use the ProtocolEncoding option with XML::Parser. If you think you need to do it because the XML document does not declare an encoding then the right solution is to fix the XML document - because if it doesn't use UTF-8 or UTF-16 and it doesn't declare an encoding then it's not XML.
    2. You should almost never use a text string* in Perl that's not either plain 7bit ASCII or UTF-8. Perl's built-in functions such as index(), length(), reverse() etc 'understand' ASCII and UTF-8, they don't understand other encodings.
    3. Given that you have successfully converted a non-UTF document to UTF-8 on input, the only other place you would typically need to do an encoding conversion is on output. For example (from the Perl XML FAQ):
      open my $fh, '>:encoding(windows-1250)', $path or die "open($path): $! +"; print $fh $utf_string;
    * My definition of 'text string' here excludes strings of binary bytes - which are of course perfectly acceptable in Perl but not usually encountered in XML :-)
Re: XML::Parser & encoding
by graff (Chancellor) on Dec 01, 2005 at 00:41 UTC
    If you're using Perl 5.8.something, you probably want to look up Encode, which lets you do things like:
    use Encode; # ... $buff holds a utf8 string: my $cp1250buff = encode( 'cp1250', $buff );
    But as grantm points out in his reply, a cp1250-encoded string is only useful as something to write as output to a file that needs to be written with this particular encoding. And in that case, you don't even need the Encode module -- just use binmode on the output file handle, as explained in the PerlIO man page:
    open( OUTPUT, ">output.txt" ) or die $!; binmode OUTPUT, ":encoding(cp1250)"; # ... or just use the 3-arg version of open: # open( OUTPUT, ">:encoding(cp1250)", "output.txt" )
Re: XML::Parser & encoding
by Tanktalus (Canon) on Nov 30, 2005 at 20:35 UTC

    I wonder if you're looking for utf8::decode?