(jeffa) Re: Picking an XML Module
by jeffa (Bishop) on Aug 03, 2003 at 15:49 UTC
|
This sounds like a good candidate for
XML::LibXML and XML::LibXSLT.
UPDATE:
First off, i really think you should change the structure
of your XML to something like: def.xml
<book>
<chapter id="1">
<part id="1">
<info>infoinfostuffhere</info>
<def word="someword">some definition of the word</def>
<extra>more info and notes</extra>
</part>
<part id="2">
<info>infoinfostuffhere</info>
<def word="someword">some definition of the word</def>
<extra>more info and notes</extra>
</part>
</chapter>
</book>
This will make parsing much easier ... having a bunch of
<section id="foo"> tags is just too general.
Also, be sure and wrap everything you can. Now that we have
some, IMHO, better XML to work with, we can define a
stylesheet: def.xsl
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Tr
+ansform">
<xsl:template match = "/book" >
<xsl:for-each select = "chapter[@id]" >
<h1> Chapter <xsl:value-of select="@id"/> </h1>
<xsl:for-each select = "part[@id]" >
<h3> Part <xsl:value-of select="@id"/> </h3>
<i><xsl:value-of select="info"/></i><br/>
<xsl:for-each select = "def[@word]" >
<b><u><xsl:value-of select="@word"/></u></b>:<br/>
</xsl:for-each>
<xsl:value-of select="def"/><br/>
<xsl:value-of select="extra"/><br/>
</xsl:for-each>
</xsl:for-each>
</xsl:template>
</xsl:stylesheet>
And finally, the script to transform all of this into 'HTML'
use strict;
use warnings;
use XML::LibXML;
use XML::LibXSLT;
my $xml = XML::LibXML->new();
my $xslt = XML::LibXSLT->new();
my $source = $xml->parse_file('def.xml');
my $style_doc = $xml->parse_file('def.xsl');
my $stylesheet = $xslt->parse_stylesheet($style_doc);
my $results = $stylesheet->transform($source);
print $stylesheet->output_string($results);
jeffa
L-LL-L--L-LL-L--L-LL-L--
-R--R-RR-R--R-RR-R--R-RR
B--B--B--B--B--B--B--B--
H---H---H---H---H---H---
(the triplet paradiddle with high-hat)
| [reply] [d/l] [select] |
|
|
First off, i really think you should change the structure of your XML to something like: def.xml
Why? Why would you pervert a perfectly logical, not to mention practical, document structure, in order for your code to be easier to write? The original format makes sense, why add extra tags everywhere to avoid having to deal with mixed content? Mixed content exists, it's there for a good reason: that's how you write documents.
What happens if you have more than one definition in the section? Would you have this:
<part id="1">
<info>infoinfostuffhere</info>
<def word="someword">some definition of the word</def>
<extra>more info and notes</extra>
<extradef word="someword">some definition of an other word</extr
+adef>
<doubleextra>even more info and notes</doubleextra>
</part>
I don't think it would make sense either!
| [reply] [d/l] |
|
|
Because i am still a newbie at XML. :P
Seriously, because i didn't know any better ... i see now
why the mixed content is OK to have. ninja-joe ... my
apologies. If you do take my advice, feel free to ask more
questions ... i personally find XSLT and XPath to be
somewhat hard to work with until you get the hang of them.
While i was developing hacking out the
above code, i found it 'easier' (falsely, of course) to
wrap everything instead of dealing with mixed content.
mirod++ yet again. ;)
jeffa
L-LL-L--L-LL-L--L-LL-L--
-R--R-RR-R--R-RR-R--R-RR
B--B--B--B--B--B--B--B--
H---H---H---H---H---H---
(the triplet paradiddle with high-hat)
| [reply] |
Re: Picking an XML Module
by CountZero (Bishop) on Aug 03, 2003 at 15:51 UTC
|
Have a look at "So many ways to Rome" an interesting artice listing the pros and cons of the various Perl XML-modules. It was presented at the YAPC:EU in Paris and I found it very enlightening. CountZero "If you have four groups working on a compiler, you'll get a 4-pass compiler." - Conway's Law
| [reply] |
Re: Picking an XML Module
by mirod (Canon) on Aug 03, 2003 at 16:07 UTC
|
I don't know what limitations of XML::Parser you are refering too. In any case it is a low-level module, and you should use higher-level ones. XML::DOM is also not a good choice IMHO, the DOM being another low-level standard that does not match what we expect from a high level language like Perl.
XML::Twig (surprise!) and XML::LibXML are the ones I usually recommend.
| [reply] |
Re: Picking an XML Module
by liz (Monsignor) on Aug 03, 2003 at 15:36 UTC
|
Before you start doing your notes in XML, maybe you should have a look at YAML. And if you still want XML in the end, you can generate that out of the YAML without too much trouble.
As far as I know, XML::Parser only suffers from the limitations of the underlying XML library Expat.
Liz | [reply] |
|
|
Actually the XML shown above contains mixed content (the def element in the middle of the text of the section element, so YAML would not cut it here. YAML is designed for serialisation of Perl/Python/Ruby/whatever data structures, it is specifically NOT designed to be equivalent to XML.
BTW, the one-liner to turn (appropriate) XML into YAML is:
perl -MXML::Simple -MYAML -e'print Dump( XMLin( "myfile.xml"))'
(from Stop Using XML Everywhere! Damn It!, that should convince you that I am not an XML fanatic ;--)
| [reply] [d/l] |
|
|
Actually, from an information organization point of view, I was wondering why the def element was at that location. If you would need to generate a list of definitions out of that XML, you would need an XPath expression like //def which can be very bad performance wise.
Liz
| [reply] [d/l] [select] |
|
|
Re: Picking an XML Module
by vek (Prior) on Aug 03, 2003 at 18:06 UTC
|
Just out of curiosity, what limitations with XML::Parser are you referring to?
In the past, you'd probably hear a lot of people trying to steer you away from XML::Parser in favor of an 'actively maintained' module. Well, matts has picked up the XML::Parser reigns and released 2.32 and 2.33 just last week in fact. So you can now add XML::Parser back onto the 'actively maintained XML parsing modules' list :-)
In the past I actively used XML::Parser until the requirements for my project changed. I needed to be able to validate the XML against a DTD so I switched to XML::LibXML. I would have probably stayed with XML::Parser otherwise.
--
vek
--
| [reply] [d/l] [select] |
|
|
The main limitation of XML::Parser is that it is a low level module: you have to do a lot of work yourself. The best example is probably that you have to buffer the data returned by the character handler, or it will come in several chunks. In general SAX-level handlers are quite a pain to write. And XML::Parser is not even SAX, so you don't get to benefit from the work that is being done at the moment on SAX modules (XML::SAX::Machines or XML::Filter::Dispatcher for example have some very good ideas). OTOH I must sau that antiquated as it is, XML::PArser's interface is a bit more convenient that pure SAX.
But if I compare this to the simplicity of... XML::Simple (which would not work in this case, it does not deal well with mixed content), or to the power of XML::LibXML's XPath engine, I don't think that XML::Parser is a good choice today.
There are also some problems with the way XML::Parser deals with entities (especially in attribute values) that can be annoying if your XML uses them.
| [reply] |
|
|
| [reply] |
|
|
I'm only maintaining XML::Parser so that it can ultimately be deprecated (and so that what bugs there are get fixed). Next release will have a LARGE warning in the documentation about how you shouldn't use this module.
| [reply] |