in reply to Re: XML::Parser and &entity;
in thread XML::Parser and &entity;

Just to get this 100% clear in my not very XMLized head.

An valid XML file with no includes/inline entity definitions may contain:

and nothing else?

The good news is that I do in fact control the source data file so I can do further mungeing. It looks like the best option is to utf-8 the file including convertig to utf-8 the entities that are not defined. Then, since the characters are in fact all valid latin-1 doing my favourite pack/unpack trick to convert UTF-8 back to latin-1 for the display

sub utf8toNative() { my $c = pack("C*",unpack("U*",$_[0])); return ((length($c)==length($u))?$_[0]:$c);
(You have to return the string unchanged if the lengths are the same as new string may be incorrect in such cases)

Dingus


Enter any 47-digit prime number to continue.

Replies are listed 'Best First'.
Re: Re: Re: XML::Parser and &entity;
by mirod (Canon) on Nov 26, 2002 at 19:12 UTC

    utf-8-ing everything will indeed save you some headache. That will be playing along with the "XML Way", instead of fighting it. Just to be complete though: you can use an other encoding if you specify it in the xml declaration (<?xml version="1.0" encoding="ISO-8859-1"?>. XML::Parser based modules will nevertheless convert the input to utf-8 before passing it to your code.