This is an XML FAQ: if you want to include unstructured text that can include anything, including < and & characters, then you can use CDATA sections:
<doc>
<p>regular text here, > needs to be escaped as <</p>
<literal><![CDATA[here you can use < and & and whatever you want]]><
+/literal>
<literal><![CDATA[this is how you include the CDATA end
mark ]]]]><![CDATA[> by spliting it into 2 different
CDATA sections]]></literal>
</doc>
Note that the CDATA section has no effect on the element structure. In fact it is just a convenience that allows you not to have to escape every single instance of < and & (and " or ' in attributes).
BTW, you probably want to generate HTML from a CDATA section (which would be your next question ;--), even though I don't think browsers support them. It is pretty easy: all you have to do is turn them into regular PCDATA and print them, all special characters will then be escaped!:
#!/bin/perl -w
use strict;
use XML::Twig;
my $t= XML::Twig->new( );
$t->parse( \*DATA);
foreach my $cdata ( $t->descendants( '#CDATA'))
{ $cdata->set_pcdata( $cdata->cdata);
$cdata->set_gi( '#PCDATA');
}
$t->print;
__DATA__
<doc>
<p>regular text here, < needs to be escaped as &lt;</p>
<literal><![CDATA[here you can use < and & and whatever you want]]><
+/literal>
</doc>
updated 2005-05-04: a ]]> was missing from the last CDATA. Thanks to ambrus for pointing this out. |