dsm has asked for the wisdom of the Perl Monks concerning the following question:

Hello Monks, Please forgive my ignorance - I'm a newbie to Perl (and even newer to XML), and i'm trying to use the XML::Parser module to Parse an XML file. Problem is, Im not quite sure on how to do it - Have read the perldoc, and the tutorial, but still a little unsure...If I have an XML file, which has a root element and then four child elements, how do i get that data out of the file? Do I need a hash if there will only ever be one 'record' in the file? finally, what do the variables $p, $elt and %atts contain in the XML::Parser Tutorial? Many thanks to anyone who takes the time to respond to this post... :)

Replies are listed 'Best First'.
Re: Help Using XML::Parser module
by edan (Curate) on Jun 11, 2003 at 11:51 UTC

    If you're a newbie to perl and XML, I wouldn't recommend using XML::Parser (actually, I'm not sure I would recommend it even if you weren't a newbie). Perhaps something like XML::Twig (by our own mirod) will present an easier interface to get you started parsing XML in perl... there's also XML::Simple, depending on how basic your XML parsing needs are...


      Thanks 3dan, I've decided to use XML::Simple, but just have one question which you (or perhaps someone else) may be able to answer...When writing XML files using this module, must the data be stored in a hash? or can you just tell it which parts is the <tags> and which part is XML bit?

        Yes, if you want to use XML::Simple to output XML then you do need to store the data in a hash. If it's not already in a hash then that might be a roundabout way of achieving your aims. You may also find that the loss of ordering which results from using a hash is a problem - if you don't get the results you want quickly then try another approach, XML::Simple's forte is really the reading side of things. There are one or two modules specifically for writing XML but I have no experience with them, print works for me.

Re: Help Using XML::Parser module
by zakb (Pilgrim) on Jun 11, 2003 at 12:00 UTC

    The basic premise for XML::Parser is that you define subroutines which get called when a start tag is found, an end tag is found or character data (i.e. between two tags) is found.

    From your question, it looks like you need some help with the subroutines.

    The three variables $p, $elt, %atts in the tutorial are defined as follows:

    sub hdl_start{ my ($p, $elt, %atts) = @_; return unless $elt eq 'message'; # We're only interrested in wh +at's said $atts{'_str'} = ''; $message = \%atts; }

    This snippet defines a subroutine called hdl_start. It gets called whenever the parser discovers a start tag (like <chatter>. The parser passes three parameters to the subroutine in the array @_. The first line in the subroutine (my ... then extracts those parameters from the array into the $p, $elt, %atts variables. The documentation for XML::Parser tells us that the parameters are:

    Start (Expat, Element [, Attr, Val [,...]])
    • Expat ($p) is a reference to the underlying Expat XML parser.
    • Element ($elt) is the name of the element found - e.g. chatter
    • Attr, Val (%atts) are attribute / value pairs found as part of the start tag

    So, to use a bit of the XML from the tutorial: <INFO site="" sitename="Perl Monks">

    • $elt would contain 'INFO'
    • %atts would contain
      site => '' sitename => 'Perl Monks'

    I think perhaps you may want to read some of the other Tutorials and may want to invest in the "Learning Perl" (llama) book. Tackling Perl and XML::Parser as your first project may be a little ambitious!

      I've used XML::Parser for several projects (I started before reading the tutorials that said it was too hard), and once you get used to the event driven model, it's not really difficult at all - especially if you control the XML format, too. Example: All info in attributes can really make life easy.

      --Bob Niederman,

        The problem with XML::Parser is not so much that it is hard to use, although it is quite convoluted in places. The problem is that SAX is a standard alternative to its streaming mode, and that XML::LibXML is a way better alternative to its tree mode. Plus XML::Parser is not actively maintained at the moment, which is always a problem when a new version of Perl, or of expat (the underlying library) comes out.

        So if you need to work in stream mode, your time will be better spent learning SAX, and if you are looking for convenience then XML::Simple, XML::LibXML or XML::Twig (or any of a number of other modules) are certainly better choices.

Re: Help Using XML::Parser module
by mirod (Canon) on Jun 11, 2003 at 14:30 UTC