in reply to An ampersand is not well-formed XML data?

Perhaps change ampersands to their HTML entity equivalent (&) in my character event handler?

yup, that's how you do it, but don't forget about <, >, and "
here is one way to handle the problem for all data:

# global lookup hash my %ESCAPES = ( '&' => '&amp;', '<' => '&lt;', '>' => '&gt;', '"' => '&quot;', ); # the subroutine sub xml_encode { my ($str) = @_; $str =~ s/([&<>"])/$ESCAPES{$1}/ge; return $str; } # and invoke it like $data = xml_encode($data);
But this is just one way

Jeff

R-R-R--R-R-R--R-R-R--R-R-R--R-R-R--
L-L--L-L--L-L--L-L--L-L--L-L--L-L--

Replies are listed 'Best First'.
Re: (jeffa) Re: An ampersand is not well-formed XML data?
by merlyn (Sage) on Apr 30, 2001 at 21:06 UTC
    Ampersand needs to be encoded everywhere. Quote needs to be encoded within an quoted argument. Less-than and greater-than need to be encoded outside a quoted argument. It's not an error to encode all four everywhere, but it's overkill.

    -- Randal L. Schwartz, Perl hacker

      Actually greater-than does not need to be encoded at all. There is never any problem with it, as it only has a special meaning at the end of a tag, where regular character data cannot appear. <doc att=">">></doc> is a perfectly valid piece of XML.

      Michel V. Rodriguez, XML Hacker ;--)

Re: (jeffa) Re: An ampersand is not well-formed XML data?
by donfreenut (Sexton) on Apr 30, 2001 at 21:11 UTC

    Okay, right on. The problem now is where I should do the encoding. XML::Parser bombs out and dies as soon as it sees the ampersand, before it gets passed to the handler.

    I want to be able to either scan the XML from a file or get it from a socket. Am I going to have to read the data from one of those two places first, do the encoding, then have XML::Parser parse the results? That seems hard, because I'd have to decide before parsing what should be parsed (I don't want to go replacing the quotes around XML attributes with &quot; - the XML parser wouldn't be able to parse).

    Is there some easier way to do the encoding? Is there any way at all I can keep XML::Parser from crapping out before I get a chance to replace the ampersand?

    Thanks...
    ---
    donfreenut
      Okay, right on. The problem now is where I should do the encoding. XML::Parser bombs out and dies as soon as it sees the ampersand, before it gets passed to the handler.
      It needs to get done before it ends up as so-called XML. It's not XML if the encoding hasn't been done. Go upstream and fix the problem there. If you are getting files in that format, scream at the provider. For them to call it XML is doing a disservice to the meaning of what XML's about.

      -- Randal L. Schwartz, Perl hacker


        I will go scream at nate, but I don't think he'll listen to me :)

        Anyway, are you saying that the Everything Engine should be encoding the special characters into HTML entities before spitting them out as XML?

        ---
        donfreenut