Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?
 
PerlMonks  

That UTF pain...

by yosefm (Friar)
on Aug 11, 2003 at 16:04 UTC ( #282912=note: print w/replies, xml ) Need Help??


in reply to XML::Parser

I encountered this problem (before I knew about it :-( ) too.

I'd like to point out that if you only work with one language (in the XML input and in your output) it's very easy to bypass this using Text::Iconv - I did it as a simple conversion sub that I called for outputting data from the XML. I guess this could be done with a handler too, but I haven't tried it yet.

Here's my sub:

sub unUTF8 { my $conv = Text::Iconv->new("UTF-8", "iso-8859-8"); #That's hebrew +. return $conv->convert(shift); }

Replies are listed 'Best First'.
Re: That UTF pain...
by Aristotle (Chancellor) on Aug 11, 2003 at 21:11 UTC
    If you need this frequently, it should probably be
    { my $conv; sub unUTF8 { $conv ||= Text::Iconv->new("UTF-8", "iso-8859-8"); return $conv->convert(shift); } }
    instead.

    Makeshifts last the longest.

Re: That UTF pain...
by mirod (Canon) on Aug 11, 2003 at 16:34 UTC

    In perl 5.8.* you can also use Encode, which provides encoding/decoding methods. You can also have a look at Converting character encodings for additional ways of doing this (the regexp method might not work with recent versions of perl and/or XML::Parser.

    XML::Twig also lets you work in the original encoding for the document, by using the keep_encoding option.

    Finally, if there is any way for you to work in UTF-8, it is probably a good idea. Note that most Web browsers, data bases and mail agents now support it, most editors and terminals too, not to mention perl 5.8.*

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://282912]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others scrutinizing the Monastery: (3)
As of 2021-12-01 19:10 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    R or B?



    Results (14 votes). Check out past polls.

    Notices?