That's a very dangerous path you are takin here: trying to process XML with regexps. On XML parsing gives a bunch of reasons on why you should not do it in general, but here is a little test:
<?xml version="1.0"?> <doc><elt>a regular elt with a > in it</elt> <pre> spaces are significant in this element as well as line returns</pre> <elt att="this is valid >" /> <!-- <elt>commented out</elt><elt>(I mean all of it)</elt> --> <elt2><sub>if a \n is inserted before the sub element then the document is still well-formed but not valid anymore, as the DTD is <![CDATA[<!ELEMENT elt2 (#PCDATA|sub)>]]></sub></elt2> <elt><![CDATA[<toto><tata>booh<tutu>]]></elt> <elt>text with an <sub>embedded</sub> element</elt> </doc>
gives the following output:
<?xml version="1.0"?> <doc> <elt>a regular elt with a > in it</elt> <pre> spaces are significant in this element as well as + line returns</pre> <elt att="this is valid >" /> <!-- <elt>commented out</elt> <elt>(I mean all of it)</elt> --> <elt2> <sub>if a \n is inserted before the sub element then t +he document is still well-formed but not valid anymore, as the DTD is + <![CDATA[<!ELEMENT elt2 (#PCDATA|sub)>]]> </sub> </elt2> <elt> <![CDATA[<toto> <tata>booh<tutu>]]> </elt> <elt>text with an <sub>embedded</sub> element</elt> </doc>
Your tool does OK in a lot of situations, except:
And I am not even talking about problems with documents in different encodings, which could trip your regexps...
The only safe way to break an XML document without knowing its DTD is to put the breaks in the only place where they cannot be significant: within the tags!
That might not be pretty but it is readable:
<?xml version="1.0"?> < doc>< elt att="val">you can also break between the tag and the attribute and + between attributes</elt></doc>
By the way, there are a number of modules on CPAN that do pretty printing of XML documents, such as XML::Handler::YAWriter or XML::Filter::Reindent but I have not tested them and from reading the docs I am not sure they are what you are looking for (they are probably too slow and quite complex). But at least they would read the XML properly.
In reply to Re: You have xml files where this formatting tool does not work?
by mirod
in thread You have xml files where this formatting tool does not work?
by LupoX
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |