gjeffrey has asked for the wisdom of the Perl Monks concerning the following question:

Learned Men of the Perl Realm:

I am writing a perl program to remove certain servlet tags from a web.xml file. One attempt (see listing) seems to work well except that comments are not preserved.

The attempt below preserves comments but mangles the XML prolog. What can I do to overcome this?

------------------ Perl Code ---------------- #!/usr/bin/perl use strict; use XML::Twig; my $twig= new XML::Twig(TwigRoots => { servlet => 1 }, TwigHandlers => { servlet => \&servletTag}, + TwigPrintOutsideRoots => 1 ); $twig->set_pretty_print( "indented"); $twig->parsefile( "web.xml"); $twig->flush(); sub servletTag { my $LOCALE = "es_US"; my( $twig, $servlet)= @_; my $jspFileTag= $servlet->first_child("jsp-file");; if (defined($jspFileTag)) { my $path = $jspFileTag->text(); my $LOCALE_MOD = "/$LOCALE/"; if ( $path =~ /$LOCALE_MOD/ ) { $servlet->cut(); return; } } } ------------------ XML Input File ------------ <?xml version="1.0" encoding="ISO-8859-1"?> <!DOCTYPE web-app PUBLIC "-//Sun Microsystems, Inc.//DTD Web Application 2.3// +EN" "http://java.sun.com/dtd/web-app_2_3.dtd"> <!-- This is my comment --> <web-app> <servlet> <servlet-name>Authenticate_en_US</servlet-name> <jsp-file>/vxml/en_US/type_0/entry/Authenticate.jsp</jsp-file> <load-on-startup>2</load-on-startup> </servlet> <servlet> <servlet-name>Authenticate_es_US</servlet-name> <jsp-file>/vxml/es_US/type_0/entry/Authenticate.jsp</jsp-file> <load-on-startup>2</load-on-startup> </servlet> </web-app> ------------------ Program Output ------------ <?xml version="1.0" encoding="ISO-8859-1"?>><!-- This is my comment -- +><web-app> </web-app> <web-app> <servlet> <servlet-name>Authenticate_en_US</servlet-name> <jsp-file>/vxml/en_US/type_0/entry/Authenticate.jsp</jsp-file> <load-on-startup>2</load-on-startup> </servlet>

Replies are listed 'Best First'.
Re: XML::Twig Mangles XML prolog
by pg (Canon) on Jul 27, 2004 at 04:27 UTC

    I wanted to see whether I can help, so I ran your program, but I got different result. Both prolog and comments are fine.

    It is quite reasonable for me to think that this is a version thing. My XML::Twig is 3.14, however the result might also be impacted by things like expat, XML::Parser and Scalar::Util.

    <?xml version="1.0" encoding="ISO-8859-1"?> <!DOCTYPE web-app PUBLIC "-//Sun Microsystems, Inc.//DTD Web Applicat +ion 2.3//E N" "http://java.sun.com/dtd/web-app_2_3.dtd"> <!-- This is my comment --> <web-app> </web-app><web-app> <servlet> <servlet-name>Authenticate_en_US</servlet-name> <jsp-file>/vxml/en_US/type_0/entry/Authenticate.jsp</jsp-file> <load-on-startup>2</load-on-startup> </servlet> </web-app>
    3.14
      Thanks for your response. I am not in a position to update the Twig installation. The code work "kind-of-ok" when I remove "TwigPrintOutsideRoots => 1". The program still changes PUBLIC to SYSTEM in doctype, and adds standalone="no" to the xml declaration. However I think I can work around these by acessing and modifying these in the object tree. Thanks again.
Re: XML::Twig Mangles XML prolog
by iburrell (Chaplain) on Jul 26, 2004 at 20:45 UTC
    The mangling I see is removing the DOCTYPE, or removing the PUBLIC identifier from the DOCTYPE. I think you can turn off loading of the DTD. This may make XML::Twig not change the prolog. You can also access the doctype with doctype() and set it with set_doctype().