John M. Dlugosz has asked for the wisdom of the Perl Monks concerning the following question:

How can I implement something like the code tags here--content that is automatically escaped out, and terminates with the ending tag.

If I can switch in a filter when seeing the start tag...? There are methods named input_filter and parse_start_tag which look tantalizing, but the documentation doesn't give enough detail on them. Maybe it would require knowing more about how the Expat parser gobbles in the file?

—John

Replies are listed 'Best First'.
Re: XML::Twig - literal nodes
by mirod (Canon) on Nov 08, 2001 at 12:07 UTC

    This is an XML FAQ: if you want to include unstructured text that can include anything, including < and & characters, then you can use CDATA sections:

    <doc> <p>regular text here, > needs to be escaped as &lt;</p> <literal><![CDATA[here you can use < and & and whatever you want]]>< +/literal> <literal><![CDATA[this is how you include the CDATA end mark ]]]]><![CDATA[> by spliting it into 2 different CDATA sections]]></literal> </doc>

    Note that the CDATA section has no effect on the element structure. In fact it is just a convenience that allows you not to have to escape every single instance of < and & (and " or ' in attributes).

    BTW, you probably want to generate HTML from a CDATA section (which would be your next question ;--), even though I don't think browsers support them. It is pretty easy: all you have to do is turn them into regular PCDATA and print them, all special characters will then be escaped!:

    #!/bin/perl -w use strict; use XML::Twig; my $t= XML::Twig->new( ); $t->parse( \*DATA); foreach my $cdata ( $t->descendants( '#CDATA')) { $cdata->set_pcdata( $cdata->cdata); $cdata->set_gi( '#PCDATA'); } $t->print; __DATA__ <doc> <p>regular text here, &lt; needs to be escaped as &amp;lt;</p> <literal><![CDATA[here you can use < and & and whatever you want]]>< +/literal> </doc>

    updated 2005-05-04: a ]]> was missing from the last CDATA. Thanks to ambrus for pointing this out.

      I'm aware of CDATA, but you misunderstand. Here in PM, we don't have to put our <code> in CDATA sections; rather, special characters can appear directly in them. For example, <code>Foo& r1= x; if (x<y) bar();</code>. No typing of CDATA there... just filtering of the source.

      More formally, when a specified start tag is discovered, check its attributes (because this mechanism is optional) and switch in a source filter or otherwise pre-process the input stream, stopping when the pattern "</$name>" is encountered.

      —John

        Sorry, you can't do this in XML.

        XML::Twig reads XML files, and a file with random &'s and <'s is _not_ XML. Hence XML::Twig or any XML tool can't do a thing for you there. If you want to include random special characters then you _have_ to use one of the 2 appropriate schemes allowed in XML: either escape each instance of those character or use a CDATA section.

        What you are describing is an interesting format, it is an extension of the input format accepted by PerlMonks actually, but it is not XML. And no tool based on an XML parser can accept it.

        Darn! You've reached the limit beyond which I can't extend XML::Twig. I can't believe it!