zeimusu has asked for the wisdom of the Perl Monks concerning the following question:

Are there any users of XML::Parser out there that have tried using Shift_JIS encoding?

There is a message in the XML/Parser/Encoding/ directory that explains that there isn't a Shift_JIS encoding file, or rather that there are four for four incompatible encodings that might be called "Shift_JIS". Thats fine and dandy, but my xml is <?xml version="1.0" encoding="Shift_JIS"?>, and I can't really change that now.

What am I to do? I tried copying one of the x-sjis....enc files to "shift_jis.enc" but I still get an "unknown encoding" error.

I'm using perl 5.8.5 with cygwin, and the latest versions of XML::Parser and Encoding on cpan.

Begging your favour.

James.

Replies are listed 'Best First'.
Re: XML::Parser Shift_JIS encoding
by graff (Chancellor) on Sep 13, 2004 at 02:12 UTC
    Looking at a few different versions of 5.8 (5.8.1 on macosx, 5.8.5 on freebsd), I see a "shiftjis" encoding, which ought to work.

    Even if XML::Parser doesn't think it exists, you could at least use Encode::decode the convert the source data into utf8 before getting XML::Parser involved in the process (assuming that this module is okay for handling utf8 data).

      Ok I got this figued out.

      The encoding files identify themselves in their header, and the name of the encoding in the header had better match the file name.

      So, with a binary editor I edited the encoding name in the file from x-sjis-cp932 to shift_jis (you have to be careful not to change the file length, so you have to pad the encoding name with NUL characters, in vim that NUL can be entered with Ctrl-K N U.

      That file is saved as shift_jis.enc, then my sample script runs fine. It wasn't anything to do with cygwin, hearty apologies for bad mouthing their fine dll.

      A safer way to do this would be to remake the encoding files with the XML::Encoding package, but this works as a quick fix.

      James

Re: XML::Parser Shift_JIS encoding
by mirod (Canon) on Sep 10, 2004 at 15:55 UTC
    I tried copying one of the x-sjis....enc files to "shift_jis.enc" but I still get an "unknown encoding" error

    Did you by any chance move the file in the source directory and not in the install one?

      I don't think that I've moved files in the source directory. In fact I'm starting to think that this might be a cygwin problem. Here is my test file:

      use XML::Parser; print join "\n", @XML::Parser::Expat::Encoding_Path; $xml=new XML::Parser; $xml->parse('<?xml version="1.0" encoding="Shift_JIS"?><foo>bar</foo>' +);

      And my output

      /usr/lib/perl5/site_perl/5.8.5/cygwin-thread-multi-64int/XML/Parser/En +codings /usr/lib/perl5/vendor_perl/5.8.5/cygwin-thread-multi-64int/XML/Parser/ +Encodings unknown encoding at line 1, column 30, byte 30 at /usr/lib/perl5/site_ +perl/5.8.5/cygwin-thread-multi-64int/XML/Parser.pm line 187

      I've created shift_jis.enc files in both the vendor perl and site perl directories, by copying the file x-sjis-cp932.enc to shift_jis.enc (Downcasing the encoding name as per documentation.)

      Do you get the same error? Thanks