Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

I've beaten my head against the wall trying to get XML::Parser to chop up my file in the right way. XML::Parser seems to be a good tool for the job since I need to use the Stream mode and it is already installed.

When trying to parse this file
<?xml version="1.0" encoding="UTF-8"?> <web property="perl.com" id="12345" date="2004-07-29"> <level> <page name="dodo bird"/> <page name="camel"/> </level> <content> <file type="text/*" number="123" bytes="654322"/> <file type="image/*" number="23" bytes="7654322"/> </content> <user> <Product> <Name>Product1</Name> <Quantity>1</Quantity> </Product> <Product> <Name>Product2</Name> <Quantity>1</Quantity> </Product> </user> </web>


I'm having issues coercing the structure into a record that I can easily output. For example, I'd like to have a record like $data{ $id }->{level}->{page}->{name} where name contains both "dodo bird" and "camel"

I've read the tutorial here and that method works well when there are not multiple elements with the same name. It doesn't work well with this exampel

You know when you're trying to think of a movie star's name...and you just can't seem to come up with it? Such is my pain. Any advice is appreciated.

Anon Monk

Replies are listed 'Best First'.
Re: XML Parsing Woes
by davorg (Chancellor) on Aug 05, 2004 at 14:24 UTC

    If you're trying to build a data structure, then why not use XML::Parser in Tree mode (or XML::Simple). This will give you a data structure that models the XML document and your problem then becomes converting one Perl data structure into another.

    --
    <http://www.dave.org.uk>

    "The first rule of Perl club is you do not talk about Perl club."
    -- Chip Salzenberg

      I actually gave both of those recommendations a try and then quickly realized that the memory footprint would be far too high. The files I'm going to be parsing are on the order of 500MB.
        Then you probably want XML::Twig, which allows you to trigger on just a part of the tree as needed, while the parsing is happening.

        But I am confused. You say you want a data structure, yet when you've created such a data structure, you get more than you want. You'll need to decide exactly what you want!

        Or maybe you can build the data structure using DBM::Deep, which keeps most of it out on disk instead of in memory.

        -- Randal L. Schwartz, Perl hacker
        Be sure to read my standard disclaimer if this is a reply.