Beefy Boxes and Bandwidth Generously Provided by pair Networks
Welcome to the Monastery
 
PerlMonks  

That UTF pain...

by yosefm (Friar)
on Aug 11, 2003 at 16:04 UTC ( [id://282912] : note . print w/replies, xml ) Need Help??


in reply to XML::Parser

I encountered this problem (before I knew about it :-( ) too.

I'd like to point out that if you only work with one language (in the XML input and in your output) it's very easy to bypass this using Text::Iconv - I did it as a simple conversion sub that I called for outputting data from the XML. I guess this could be done with a handler too, but I haven't tried it yet.

Here's my sub:

sub unUTF8 { my $conv = Text::Iconv->new("UTF-8", "iso-8859-8"); #That's hebrew +. return $conv->convert(shift); }

Replies are listed 'Best First'.
Re: That UTF pain...
by Aristotle (Chancellor) on Aug 11, 2003 at 21:11 UTC
    If you need this frequently, it should probably be
    { my $conv; sub unUTF8 { $conv ||= Text::Iconv->new("UTF-8", "iso-8859-8"); return $conv->convert(shift); } }
    instead.

    Makeshifts last the longest.

Re: That UTF pain...
by mirod (Canon) on Aug 11, 2003 at 16:34 UTC

    In perl 5.8.* you can also use Encode, which provides encoding/decoding methods. You can also have a look at Converting character encodings for additional ways of doing this (the regexp method might not work with recent versions of perl and/or XML::Parser.

    XML::Twig also lets you work in the original encoding for the document, by using the keep_encoding option.

    Finally, if there is any way for you to work in UTF-8, it is probably a good idea. Note that most Web browsers, data bases and mail agents now support it, most editors and terminals too, not to mention perl 5.8.*