Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hi Monks,
I have perl 5.8 installed in my system.   I am facing a problem while parsing a japanese xml file.   The parser throws me an error as follows:
Couldn't open encmap shift-jis.enc: No such file or directory at /opt/perl/lib/site_perl/5.8.0/PA-RISC1.1- +thread-multi/XML/Parser.pm line 185

The xml file has it's charset as <?xml version="1.0" encoding="Shift-JIS" ?>   But another Japanese xml file with same encoding settings got parsed correctly with the same perl installations.

Is there some known issue with Japanese wrt XML parser?   What else can I verify to see why these two xml files behave differently with the XML parser?   Please enlighten me.........

janitored by ybiC: Reformat for legibility

Replies are listed 'Best First'.
Re: Question on XML parser
by mirod (Canon) on Feb 11, 2004 at 18:19 UTC

    Did you install XML::Encoding? That's the module that gives you additional encodings for XML::Parser. You might also want to read the blurb about Japanese encodings in /opt/src/XML-Encoding-1.01/maps/Japanese_Encodings.msg (or a similar location). From what I get from the message you might have to look at the encoding files to choose the one that really works for you use compile_encoding to generate the proper .enc file, or just rename the appropriate x-sjis-*.xml file to shift-jis.xml.

    If you go this route it would be nice if you could post what you did, and maybe contact grantm so he could add it to the Perl-XML FAQ.

    An alternate solution would be to convert your documents to UTF-8 using iconv or Encode: XML::Parser will give you all strings in UTF-8 anyway, so you might as well do it pre-emptively. You can then convert your output back to shift-jis using the same technique.

    The last option I can think of: you could also change the encoding to one of the variants of shift-jis supported by XML::Encoding, which would give the files a more accurate value for the encoding, but might be a problem if you process them using other tools.