XML::Twig - literal nodes

John M. Dlugosz has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: XML::Twig - literal nodes by mirod (Canon) on Nov 08, 2001 at 12:07 UTC
This is an XML FAQ: if you want to include unstructured text that can include anything, including < and & characters, then you can use CDATA sections: `<doc> <p>regular text here, > needs to be escaped as <</p> <literal><![CDATA[here you can use < and & and whatever you want]]>< +/literal> <literal><![CDATA[this is how you include the CDATA end mark ]]]]><![CDATA[> by spliting it into 2 different CDATA sections]]></literal> </doc>` [download] Note that the CDATA section has no effect on the element structure. In fact it is just a convenience that allows you not to have to escape every single instance of < and & (and " or ' in attributes). BTW, you probably want to generate HTML from a CDATA section (which would be your next question ;--), even though I don't think browsers support them. It is pretty easy: all you have to do is turn them into regular PCDATA and print them, all special characters will then be escaped!: `#!/bin/perl -w use strict; use XML::Twig; my $t= XML::Twig->new( ); $t->parse( \DATA); foreach my $cdata ( $t->descendants( '#CDATA')) { $cdata->set_pcdata( $cdata->cdata); $cdata->set_gi( '#PCDATA'); } $t->print; __DATA__ <doc> <p>regular text here, < needs to be escaped as &lt;</p> <literal><![CDATA[here you can use < and & and whatever you want]]>< +/literal> </doc>` [download] updated* 2005-05-04: a `]]>` was missing from the last CDATA. Thanks to ambrus for pointing this out.	[reply] [d/l] [select]
Re: Re: XML::Twig - literal nodes by John M. Dlugosz (Monsignor) on Nov 08, 2001 at 21:12 UTC
I'm aware of CDATA, but you misunderstand. Here in PM, we don't have to put our <code> in CDATA sections; rather, special characters can appear directly in them. For example, `<code>Foo& r1= x; if (x<y) bar();</code>`. No typing of CDATA there... just filtering of the source. More formally, when a specified start tag is discovered, check its attributes (because this mechanism is optional) and switch in a source filter or otherwise pre-process the input stream, stopping when the pattern `"</$name>"` is encountered. —John	[reply] [d/l]
Re: Re: Re: XML::Twig - literal nodes by mirod (Canon) on Nov 08, 2001 at 22:10 UTC
Sorry, you can't do this in XML. XML::Twig reads XML files, and a file with random &'s and <'s is _not_ XML. Hence XML::Twig or any XML tool can't do a thing for you there. If you want to include random special characters then you _have_ to use one of the 2 appropriate schemes allowed in XML: either escape each instance of those character or use a CDATA section. What you are describing is an interesting format, it is an extension of the input format accepted by PerlMonks actually, but it is not XML. And no tool based on an XML parser can accept it. Darn! You've reached the limit beyond which I can't extend XML::Twig. I can't believe it!	[reply]
Re: Re: Re: Re: XML::Twig - literal nodes by John M. Dlugosz (Monsignor) on Nov 09, 2001 at 00:18 UTC