Picking an XML Module

ninja-joe has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
(jeffa) Re: Picking an XML Module by jeffa (Bishop) on Aug 03, 2003 at 15:49 UTC
This sounds like a good candidate for XML::LibXML and XML::LibXSLT. UPDATE: First off, i really think you should change the structure of your XML to something like: `def.xml` `<book> <chapter id="1"> <part id="1"> <info>infoinfostuffhere</info> <def word="someword">some definition of the word</def> <extra>more info and notes</extra> </part> <part id="2"> <info>infoinfostuffhere</info> <def word="someword">some definition of the word</def> <extra>more info and notes</extra> </part> </chapter> </book>` [download] This will make parsing much easier ... having a bunch of `<section id="foo">` tags is just too general. Also, be sure and wrap everything you can. Now that we have some, IMHO, better XML to work with, we can define a stylesheet: `def.xsl` <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Tr +ansform"> <xsl:template match = "/book" > <xsl:for-each select = "chapter[@id]" > <h1> Chapter <xsl:value-of select="@id"/> </h1> <xsl:for-each select = "part[@id]" > <h3> Part <xsl:value-of select="@id"/> </h3> <i><xsl:value-of select="info"/></i><br/> <xsl:for-each select = "def[@word]" > <b><u><xsl:value-of select="@word"/></u></b>:<br/> </xsl:for-each> <xsl:value-of select="def"/><br/> <xsl:value-of select="extra"/><br/> </xsl:for-each> </xsl:for-each> </xsl:template> </xsl:stylesheet> [download] And finally, the script to transform all of this into 'HTML' `use strict; use warnings; use XML::LibXML; use XML::LibXSLT; my $xml = XML::LibXML->new(); my $xslt = XML::LibXSLT->new(); my $source = $xml->parse_file('def.xml'); my $style_doc = $xml->parse_file('def.xsl'); my $stylesheet = $xslt->parse_stylesheet($style_doc); my $results = $stylesheet->transform($source); print $stylesheet->output_string($results);` [download] jeffa L-LL-L--L-LL-L--L-LL-L-- -R--R-RR-R--R-RR-R--R-RR B--B--B--B--B--B--B--B-- H---H---H---H---H---H--- (the triplet paradiddle with high-hat)	[reply] [d/l] [select]
Re: (jeffa) Re: Picking an XML Module by mirod (Canon) on Aug 03, 2003 at 16:42 UTC
First off, i really think you should change the structure of your XML to something like: def.xml Why? Why would you pervert a perfectly logical, not to mention practical, document structure, in order for your code to be easier to write? The original format makes sense, why add extra tags everywhere to avoid having to deal with mixed content? Mixed content exists, it's there for a good reason: that's how you write documents. What happens if you have more than one definition in the section? Would you have this: `<part id="1"> <info>infoinfostuffhere</info> <def word="someword">some definition of the word</def> <extra>more info and notes</extra> <extradef word="someword">some definition of an other word</extr +adef> <doubleextra>even more info and notes</doubleextra> </part>` [download] I don't think it would make sense either!	[reply] [d/l]
(jeffa) 3Re: Picking an XML Module by jeffa (Bishop) on Aug 03, 2003 at 16:47 UTC
Because i am still a newbie at XML. :P Seriously, because i didn't know any better ... i see now why the mixed content is OK to have. ninja-joe ... my apologies. If you do take my advice, feel free to ask more questions ... i personally find XSLT and XPath to be somewhat hard to work with until you get the hang of them. While i was ~~developing~~ hacking out the above code, i found it 'easier' (falsely, of course) to wrap everything instead of dealing with mixed content. mirod++ yet again. ;) jeffa L-LL-L--L-LL-L--L-LL-L-- -R--R-RR-R--R-RR-R--R-RR B--B--B--B--B--B--B--B-- H---H---H---H---H---H--- (the triplet paradiddle with high-hat)	[reply]
Re: Picking an XML Module by CountZero (Bishop) on Aug 03, 2003 at 15:51 UTC
Have a look at "So many ways to Rome" an interesting artice listing the pros and cons of the various Perl XML-modules. It was presented at the YAPC:EU in Paris and I found it very enlightening. CountZero "If you have four groups working on a compiler, you'll get a 4-pass compiler." - Conway's Law	[reply]
Re: Picking an XML Module by liz (Monsignor) on Aug 03, 2003 at 15:36 UTC
Before you start doing your notes in XML, maybe you should have a look at YAML. And if you still want XML in the end, you can generate that out of the YAML without too much trouble. As far as I know, XML::Parser only suffers from the limitations of the underlying XML library Expat. Liz	[reply]
Re: Re: Picking an XML Module by mirod (Canon) on Aug 03, 2003 at 16:02 UTC
Actually the XML shown above contains mixed content (the `def` element in the middle of the text of the `section` element, so YAML would not cut it here. YAML is designed for serialisation of Perl/Python/Ruby/whatever data structures, it is specifically NOT designed to be equivalent to XML. BTW, the one-liner to turn (appropriate) XML into YAML is: `perl -MXML::Simple -MYAML -e'print Dump( XMLin( "myfile.xml"))'` (from Stop Using XML Everywhere! Damn It!, that should convince you that I am not an XML fanatic ;--)	[reply] [d/l]
Re: Re: Re: Picking an XML Module by liz (Monsignor) on Aug 03, 2003 at 16:10 UTC
Actually, from an information organization point of view, I was wondering why the `def` element was at that location. If you would need to generate a list of definitions out of that XML, you would need an XPath expression like `//def` which can be very bad performance wise. Liz	[reply] [d/l] [select]
Re: Re: Re: Re: Picking an XML Module by mirod (Canon) on Aug 03, 2003 at 16:33 UTC
Re: Picking an XML Module by mirod (Canon) on Aug 03, 2003 at 16:07 UTC
I don't know what limitations of XML::Parser you are refering too. In any case it is a low-level module, and you should use higher-level ones. XML::DOM is also not a good choice IMHO, the DOM being another low-level standard that does not match what we expect from a high level language like Perl. XML::Twig (surprise!) and XML::LibXML are the ones I usually recommend.	[reply]
Re: Picking an XML Module by vek (Prior) on Aug 03, 2003 at 18:06 UTC
Just out of curiosity, what limitations with XML::Parser are you referring to? In the past, you'd probably hear a lot of people trying to steer you away from `XML::Parser` in favor of an 'actively maintained' module. Well, matts has picked up the `XML::Parser` reigns and released 2.32 and 2.33 just last week in fact. So you can now add `XML::Parser` back onto the 'actively maintained XML parsing modules' list :-) In the past I actively used `XML::Parser` until the requirements for my project changed. I needed to be able to validate the XML against a DTD so I switched to XML::LibXML. I would have probably stayed with `XML::Parser` otherwise. -- vek --	[reply] [d/l] [select]
Re: Re: Picking an XML Module by mirod (Canon) on Aug 03, 2003 at 18:55 UTC
The main limitation of XML::Parser is that it is a low level module: you have to do a lot of work yourself. The best example is probably that you have to buffer the data returned by the character handler, or it will come in several chunks. In general SAX-level handlers are quite a pain to write. And XML::Parser is not even SAX, so you don't get to benefit from the work that is being done at the moment on SAX modules (XML::SAX::Machines or XML::Filter::Dispatcher for example have some very good ideas). OTOH I must sau that antiquated as it is, XML::PArser's interface is a bit more convenient that pure SAX. But if I compare this to the simplicity of... XML::Simple (which would not work in this case, it does not deal well with mixed content), or to the power of XML::LibXML's XPath engine, I don't think that XML::Parser is a good choice today. There are also some problems with the way XML::Parser deals with entities (especially in attribute values) that can be annoying if your XML uses them.	[reply]
Re: Re: Re: Picking an XML Module by vek (Prior) on Aug 03, 2003 at 19:01 UTC
mirod++, nice explanation. -- vek --	[reply]
Re: Re: Picking an XML Module by Matts (Deacon) on Aug 03, 2003 at 20:33 UTC
I'm only maintaining XML::Parser so that it can ultimately be deprecated (and so that what bugs there are get fixed). Next release will have a LARGE warning in the documentation about how you shouldn't use this module.	[reply]