http://qs1969.pair.com?node_id=481083

jdtoronto has asked for the wisdom of the Perl Monks concerning the following question:

Esteemed monks,

If you are getting sick of my newbie XML questions I ask you to bear with me once more.

I started using XML::Simple becasue, in one programme I wrote it did just enough and did it rather well. But now I find that it is so simple that in fact it can't even do simple things. For example, using the XML that I will show you, it cannot parse it, then output the same thing! The documentation agrees with this and says, well, live with it or use something else. I can't live with it, so now I ask your advice as to how I go about doing just exactly what I need to do.

Here is the XML:

<?xml version="1.0" encoding="UTF-8" ?> <voice_broadcast_info xmlns="http://www.protus.com" xmlns:xsi="http:// +www.w3.org/2001/XMLSchema-instance"> <SchemaVersion>1.0</SchemaVersion> <login_key> <user_id>123456</user_id> <user_password>123456</user_password> </login_key> <voice_broadcast_options> <billing_code>TEST1000</billing_code> <start_hour>0</start_hour> <end_hour>23</end_hour> <release_date>2003-10-01T23:59:59</release_date> <in_house_flag>false</in_house_flag> <max_successful>0</max_successful> <delivery_type>both</delivery_type> </voice_broadcast_options> <voice_recipient_list> <voice_recipient> <voice_recipient_number>14169999999</voice_recipient_number> <voice_recipient_name>mE</voice_recipient_name> <voice_recipient_reference>bc-voice-1</voice_recipient_refere +nce> </voice_recipient> <voice_recipient> <voice_recipient_number>14169999999</voice_recipient_number> <voice_recipient_name>mE - 2</voice_recipient_name> <voice_recipient_reference>bc-voice-2</voice_recipient_refere +nce> </voice_recipient> </voice_recipient_list> <voice_answer_file vcast_file_content_type="audio/wav" vcast_file_e +ncoding_type="base64" vcast_file_name="live" vcast_file_extension="wav" delivery_answer_type="voice" vcast_fi +le_desc="TESTABC123"></voice_answer_file> <machine_answer_file vcast_file_content_type="audio/wav" vcast_file_e +ncoding_type="base64" vcast_file_name="live" vcast_file_extension="wav" delivery_answer_type="machine" vcast_ +file_desc="TESTABC123"></machine_answer_file> </voice_broadcast_info>
What I need to do is parse this to a Perl data structure and then modify it, outputting it in EXACTLY the same format. What I have given you above is merely a sample.

I am quite prepared to generate the entire thing programmatically. I had started out parsing these examples as an easy way of leveraging my effort - using them as templates in other words.

What should I be using in the way of modules?

jdtoronto

Replies are listed 'Best First'.
Re: Operating on XML, or XML::Simple is too simple!
by mirod (Canon) on Aug 05, 2005 at 06:25 UTC

    IMHO XML::Simple is used when you don't really care about the XML. Either you just use it as "just another storage format", or you just load it and use the data. Because you loose some information when you load data using XMLin, it is hard to get exactly what you want with XMLout.

    If you really want to process XML documents you have to use a different kind of module, one which models the XML more faithfully than XML::Simple.

    My recommendation would be to go either with XML::LibXML or with XML::Twig.

    XML::LibXML is based on libxml2, which you have to install first (it should be already installed on most linux systems). It implements very rigourously a whole bunch of standards: XML, DOM, XPath... While the DOM is a bit of a pain to use (it is very verbose) you might already be familiar with it if you use Javascript, and the combination DOM + XPath is very powerful.

    XML::Twig (which I wrote, so I might be a little bit biased ;--) is based on XML::Parser and on expat which you might have to install (it is already installed with Activestate Perl and on a lot of *nix systems). It is more perlish and concise than XML::LibXML but it doesn't implement as many standards, and the docs are... huge!

    I would avoid XML::Parser at this point, it is not very well supported and is a bit too low level. You could also use some of the SAX modules, but they are probably not worth the pain unless you like low-level context management.

Re: Operating on XML, or XML::Simple is too simple!
by graff (Chancellor) on Aug 05, 2005 at 05:26 UTC
    So have you looked at using plain old XML::Parser? Just because you find a module called "XML::Simple", this doesn't mean that the fundamental module it's based on (XML::Parser) has to be really complicated -- and in fact it really is not complicated.

    Clear your mind of XML::Simple, forget whatever preconceptions have arisen about how you think your code should be written, then read the man page for XML::Parser and see if some different approach becomes clear. You might find that it will be easier to use, and an appropriate algorithm might become self-evident.

    (Consider that the man page for XML::Parser is only about one third as long as the one for XML::Simple -- that suggests to me that the latter module may have been misnamed.)

    Your goal is not clearly stated here... (Maybe there's a previous thread where you gave more detail?) What needs to be modified? In what sense should the output be in "exactly" the same format, given that changes of some sort need to be made? (I.e. what specific features of the input format need to be preserved?)

    And are you sure you need to hold the whole xml stream in a Perl data structure? Are there complicated, hierarchical conditions that are needed to control the mods? Or can the changes be based on relatively localized criteria?

    Whatever the task is, XML::Parser will probably be simpler than XML::Simple (which seems to have been tailored to certain types of tasks, and your task might not be among that set).

    Other references cited in earlier replies are worth looking at as well -- but I think looking at the fundamental module first would be worthwhile.

    update: Sorry, I think I may have misunderstood. You are actually looking for a way to take a bunch of new data (from some non-XML source, like a database or flat file) and push them out in the XML format shown in the example -- is that it? You want a template with placeholders where you can plug in fresh values.

    Since you seem to have some repeating elements ("voice_recipient_list" contains one or more "voice_recipients", etc) it's likely that different inputs will have different quantities of these elements. You could try setting up a set of nested templates -- e.g. a template for "voice_recipient" that can be used any number of times to build the "voice_recipient_list", then a template for "voice_broadcast_info" that has a placeholder for the "voice_recipient_list", and so on. But on the whole, it would seem more like you are building up the output from pieces, rather than just tweaking the content of an existing XML stream.

      Graff,
      update: Sorry, I think I may have misunderstood. You are actually looking for a way to take a bunch of new data (from some non-XML source, like a database or flat file) and push them out in the XML format shown in the example -- is that it? You want a template with placeholders where you can plug in fresh values.
      Pretty much hots it on the head.

      I don't care about the XML. To me XML is a way of structuring data that is yet another way of structuring data. The idea of XML::Simple converting the XML to a Perl structure which could be modified has worked well for me before. But sadly XML::Simple fails my simple test of simplicity. If I take some XML, parse it and then output it with no other processing, it should be the same.

      That being said. I moved on and asked the question here. I suppose what I really want is the HTML::Template version of XML! The data in the XML example is a job submission. Essentially: send these message files to these recipients. I have installed XML::Twig, but at first look it is daunting, but I suspect that if I perservere it will all work out in the end.

      Many thanks, John jdtoronto

        Would XML::Dumper be closer to what you are looking for? I would not let you read random XML, but it would let you read/write your data structures as XML.

Re: Operating on XML, or XML::Simple is too simple!
by Tanktalus (Canon) on Aug 05, 2005 at 14:48 UTC

    The "Simple" in many module names has two meanings:

    1. The author picked what they think are common cases, or a simple subset of the overall use cases, to support
    2. A simplified API.
    It is quite rare that the author failed to accomplish the first meaning. And each user of the API can debate whether they succeeded at the second.

    Since you don't fall into the simple requirements, you need to use a module with more flexibility. The downside to flexibility is that figuring out which API to call with which parameters usually becomes more difficult.

    Personally, the first time I had to do XML manipulation, I chose XML::Twig and haven't looked back. In that project, I had a base XML file to work from, and then I had to pull in data from other (non-XML) sources, manipulate those, and then enter them into the base XML file. XML::Twig worked wonderfully for this. My coworkers, not being perl programmers, were duly impressed by the speed not only of the implementation, but of the result. Not hugely impressed - they've come to expect this type of stuff from me working in perl ;-) - but still impressed. :-)

    (Oh, and the support from the author of XML::Twig has been pretty impressive on its own ;-})

Re: Operating on XML, or XML::Simple is too simple!
by Anonymous Monk on Aug 05, 2005 at 05:06 UTC
Re: Operating on XML, or XML::Simple is too simple!
by GrandFather (Saint) on Aug 05, 2005 at 05:12 UTC
Re: Operating on XML, or XML::Simple is too simple!
by pg (Canon) on Aug 05, 2005 at 05:55 UTC

    I know that there is a set of XML::DOM modules, but never used. Theoritically, I would think DOM is a better way to deal with XML, when you want to modify it, not just parse it. Does anyone has any experience with Perl DOM to share?

      XML::XSLT uses XML::DOM under the hood, and the only complaint I have about it (as would be the same for any DOM implementation) is that if you are dealing with even moderately large documents then you are going to consume a large amount of memory.

      /J\

Re: Operating on XML, or XML::Simple is too simple!
by BaldPenguin (Friar) on Aug 05, 2005 at 20:25 UTC
    I'm not sure how much data you are going to change, but I myself would/could use XML::LibXML to read in the Doc then use XML::LibXML::XPathContext to pull the node of data I want to play with. The node stays attached inside the doc, so I can make the changes and then re-output the document using toString() or whatever.

    Don
    WHITEPAGES.COM | INC
    Everything I've learned in life can be summed up in a small perl script!