in reply to Re: Parsing XML into a Hash
in thread Parsing XML into a Hash

It's not quite like that.
You see, I work for $consulting_company that has a contract with $client. I don't get to pick any fights here. That's up the Tech Lead (not me) and others. I just get to sit here and make executive decisions about the code like, "Well, since I am controlling all the input and output of this program, and I am controlling all the changed to the XML file, I'll just write a simple handler to accomodate what I know is going to be in the XML file," and be done with it.

Life is much easier when you get enough control over the code you write that you can make decisions like this. Now all I have to do the changes to the data back into the XML file. More like an overwrite than an addition to the file. Whee!!! FUN!

Replies are listed 'Best First'.
Re: Re: Re: Parsing XML into a Hash
by graff (Chancellor) on Nov 04, 2003 at 02:04 UTC
    Given the general consensus shown in the other replies, I'm probably putting up flame bait (or at least a downvote magnet), but here goes... You say:

    I just get to sit here and make executive decisions about the code like, "Well, since I am controlling all the input and output of this program, and I am controlling all the changed to the XML file, I'll just write a simple handler to accomodate what I know is going to be in the XML file," and be done with it.

    To which I say "Amen, Brother!" Based on the very tidy and fairly simple XML sample in your original post, I don't see a problem with writing a "tightly-bound" (i.e. ad-hoc) "parser" in a dozen or so lines of perl -- the point being to get the job done with minimal fuss (including, mainly, minimal fuss with the folks who are paying for this job). What this really means is that you just need to be very careful about testing the script that creates this XML stream, to make sure its output always meets the constraints assumed by the downstream "parser" script.

    Assuming that you can manage the quality of the XML stream as it's being created, then something like the following would probably suit the bill for reading that stream:

    open( XML, "source_of_xml.data" ) or die "I died 'cuz $!"; { local $/ = "</item>\n"; my %item; while (<XML>) { # read one whole <item>...</item> into $_ for my $tag (qw/name working uptime downtime/) { ($item{$tag}) = m{<$tag>(.*?)</$tag>}s; # (leave off "s", # if tags are always fully contained on one line) } # now, do what you want with %item... } }
    So what's wrong with that? If you really are creating the XML stream as well as processing it -- and if the data structure is really as flat as your example makes it out to be -- then you really don't need an XML parsing module.

    In essence, you seem to be using XML simply as a means of "embellishing" (reformatting) a flat table, and there's no need for a hefty, C-compiled module to handle that.

      "So what's wrong with that?"

      Murphy will rear 'is ugly and head and next thing you know the XML will need be nested:
      ... <name> <first_name>Foo</first_name> <last_name>Bar</last_name> </name> ...
      and attributes will be needed:
      ... <name part="first">John</name> <name part="last">Smith</name> ...
      .. and more goodies waiting around the corner for anyone trying to write a parser that has never written any kind of compiler. I personally do not like to waste valuable time reinventing a wheel that only takes a few moments to install. You download it, you install it, you write stuff that matters and you don't worry about the issues i pointed out.

      But .... IF and if that XML will never get more complicated that originally posted, then your code should suffice. However, a "real" XML parser does not 10 lines of code make, just take a look at XML::SAX::PurePerl. ;)

      UPDATE:
      Now you are just being silly graff. My whole point was that one shouldn't have to worry "as the need arises". Not when all you do is install a CPAN module (i like BrowserUk's suggestion: XML::Parser::Lite). mcogan1966 - you are in for a bumpy ride. Please take BrowserUk's suggestion. If you really are "Lord and Master", then use a module for this. But don't take my word for it (besides, what could i possibly know about projects that never finish because of red tape?) just wait till Murphy gets you. ;)

      jeffa

      L-LL-L--L-LL-L--L-LL-L--
      -R--R-RR-R--R-RR-R--R-RR
      B--B--B--B--B--B--B--B--
      H---H---H---H---H---H---
      (the triplet paradiddle with high-hat)
      
        Nested schmested! Those examples still look like a flat table to me. ;^) ... Seriously, this could just mean changing the names/regex patterns to identify the tags that really matter, including their attributes, as the need arises.

        Granted, the poor soul might someday have to face elements whose attributes and/or content structure show significant variability, or even -- God help him -- recursion. Then he's really not in flat-table-land anymore, and it's time to bring in the right tools (a real parser).

3Re: Parsing XML into a Hash
by jeffa (Bishop) on Nov 03, 2003 at 20:12 UTC

    "'I'll just write a simple handler to accomodate what I know is going to be in the XML file,' and be done with it."

    Perhaps you will be done with it ... then again, perhaps the requirements for how the XML file is defined will change on you. Then you get to find what broke and fix ... repeat ad nauseam. I would use a CPAN module or i would simply find another job. These folks sound pretty clueless anyways, but i do you wish you and your company the best of luck. ;)

    jeffa

    L-LL-L--L-LL-L--L-LL-L--
    -R--R-RR-R--R-RR-R--R-RR
    B--B--B--B--B--B--B--B--
    H---H---H---H---H---H---
    (the triplet paradiddle with high-hat)
    
      Clueless?
      No, not really. This is a secure site, and they take this very seriously. Besides, after further consideration of the entire thing, it was determined that XML::Simple just wasn't the right answer. I needed more control of the output anyway. Besides, the resulting code is only about 16 lines, including for proper spacing of the code and all. So, things work out in the end after all.

        "...it was determined that XML::Simple just wasn't the right answer."

        Well, sometimes XML::Simple is not enough, but in this case i think you didn't spend enough time in research. Here is how i imagine you tested out XML::Simple:
        use XML::Simple; use Data::Dumper; my $ref = XMLin(\*DATA); print Dumper $ref; __DATA__ <?xml version="1.0" encoding="UTF-8"?> <list> <item> <name>item_name_1</name> <working>yes</working> <uptime>5</uptime> <downtime></downtime> </item> <item> <name>item_name_2</name> <working>yes</working> <uptime>5</uptime> <downtime></downtime> </item> </list>
        And the results?
        $VAR1 = { 'item' => { 'item_name_1' => { 'uptime' => '5', 'downtime' => {}, 'working' => 'yes' }, 'item_name_2' => { 'uptime' => '5', 'downtime' => {}, 'working' => 'yes' } } };
        So yes, if this is what you did then i can definitely see how you would prematurely dismiss the module as the "wrong answer". But, if you had RTFM'ed, you would have seen that KeyAttr is taking the name tags and making them hash keys to each row of XML data. By simply changing the constructor's args to:
        my $ref = XMLin(\*DATA,KeyAttr=>[]);
        the results are now:
        $VAR1 = { 'item' => [ { 'uptime' => '5', 'downtime' => {}, 'working' => 'yes', 'name' => 'item_name_1' }, { 'uptime' => '5', 'downtime' => {}, 'working' => 'yes', 'name' => 'item_name_2' } ] };
        And look how trivally easy it is to get the parts:
        print $_->{name} for @{$ref->{item}};

        "I needed more control of the output anyway."

        What does that mean? Personally, i don't think you are parsing XML, i think you are parsing something that merely looks like XML. You asked the best way to parse XML, we told you. You then added more requirements, but i still say you are wasting time. Do me a favor, keep track of how many hours you spend maintaining your 16 lines of "parser" code (correcting mistakes, adding new features) and compare that to the one hour it takes to install XML::Simple and read the manual.

        Now please don't get me wrong, i wish you success. I have worked for shops that reinvent every wheel along the way and they all suffer because of it. It is a foolish, egotistical, and wasteful methodology. Listen to Maverick.

        jeffa

        L-LL-L--L-LL-L--L-LL-L--
        -R--R-RR-R--R-RR-R--R-RR
        B--B--B--B--B--B--B--B--
        H---H---H---H---H---H---
        (the triplet paradiddle with high-hat)