http://qs1969.pair.com?node_id=454607

graff has asked for the wisdom of the Perl Monks concerning the following question:

I need to pass XML data through a filter that will transform the character data but leave the tags as-is. (The transform involves converting certain ASCII characters to utf8 Arabic characters, so I have to make sure this doesn't apply to the tags).

XML::Parser, with the "Style" set to "Stream" makes this very easy and clear, but there's one problem: if the input contains empty tags like this:

<sometag attr1="v1" attr2="v2"/>
it'll come out looking like this:
<sometag attr1="v1" attr2="v2"></sometag>
Am I just being too picky? Is it too much to ask that empty tags be kept empty? Here's a brief snippet that demonstrates the behavior:
use XML::Parser; $xml = qq{<tag attr="xyz"/>}; print "original: $xml\n"; $parser = new XML::Parser( Style => 'Stream' ); print "parsed: "; $parser->parse( $xml ); print "\n"; sub StartTag { print }

Replies are listed 'Best First'.
Re: XML::Parser can't create empty tags?
by merlyn (Sage) on May 06, 2005 at 05:39 UTC
Re: XML::Parser can't create empty tags?
by mirod (Canon) on May 06, 2005 at 07:39 UTC

    If you really need to do this, and you shouldn't, you can probably modify the Stream style to get what you want. Or just write your own style. Check whether the tag finishes my '/>' and output it in the Start handler and do not output anything in the End handler.

    And then there is the obXML::Twig version (it doesn't filter attribute values, doing it is left as an exercise to the reader):

    #!/usr/bin/perl -w use XML::Twig; XML::Twig->new( twig_handlers => { _all_ => \&replace_text, }, keep_spaces => 1, # to keep the original indentation ) ->parse( \*DATA) ->flush; sub replace_text { my( $t, $elt)= @_; # need to go through all pcdata elements in case thedoc # includes mixed content (last foo element in the example) foreach my $pcdata ($t->descendants( '#PCDATA')) { $pcdata->set_pcdata( my_filter( $pcdata->pcdata) ); } $t->flush; } sub my_filter { $_[0]=~ s{foo}{bar}g; return $_[0]; } __DATA__ <doc> <foo att="foo">foo foo baz</foo> <foo att="foo"/> <foo att="baz">foo foo baz</foo> <foo att="baz">foo <elt>bar foo</elt> foo baz</foo> </doc>
Re: XML::Parser can't create empty tags?
by gube (Parson) on May 06, 2005 at 06:09 UTC

    It is a valid closing tag. It omit's the self-closed tag and then it converted into proper required expected format.

    If you give the input: <tag attr="xyz"/>

    We getting output: <tag attr="xyz"></tag>

    So, what is the problem will occur it will parse and only putting the same valid tag. Anyway, it may not affect your xml code. It's correct for parsing.

    If you use the stream function in style it automatically provide the endtag based on the starttag.

Re: XML::Parser can't create empty tags?
by dakkar (Hermit) on May 06, 2005 at 14:42 UTC
    Am I just being too picky? Is it too much to ask that empty tags be kept empty?

    Yes, you're too picky ;-). Seriously, the two XML fragments are completely identical from an XML point of view: the first form is just a shorthand notation.

    Furthermore, the canonical form for end-tags is the one produced by XML::Parser. Also, keep in mind that in SAX you always receive an 'end tag' event, independent of the way the empty element was actually written.

    -- 
            dakkar - Mobilis in mobile
    

    Most of my code is tested...

    Perl is strongly typed, it just has very few types (Dan)