kevyt has asked for the wisdom of the Perl Monks concerning the following question:

I am trying to parse an XML file but I am getting an error when I try to read an empty tag. I have looked at the docs and I have added if statements but they dont work. Can someone help?
#!/usr/bin/perl #use warnings; #use strict; use XML::DOM; use LWP::UserAgent; use LWP::Simple; open( CONTACTS, "+<contacts2.xml" ) or die( "Error opening contact.xml" ); my $parser = new XML::DOM::Parser; my $document = $parser->parse( \*CONTACTS ); printlist( $document ); seek( CONTACTS, 0, 0 ); truncate( CONTACTS, 0 ); $document->print( \*CONTACTS ); sub printlist { my $document = shift; my $root = $document->getElementsByTagName( "ResultSet" )->item( 0 ); my $contactList = $root->getChildNodes(); for my $i ( 1 .. $contactList->getLength - 1 ) { my $contact = $contactList->item( $i ); next unless ( $contact->getNodeName eq 'Result' ); my $Title = $contact->getElementsByTagName( "Title" ); my $title = $Title->item( 0 )->getChildAtIndex( 0 )->getData(); my $Address = $contact->getElementsByTagName( "Address" ); my $address = $Address->item( 0 )->getChildAtIndex( 0 )->getData +(); my $City = $contact->getElementsByTagName( "City" ); my $city = $City->item( 0 )->getChildAtIndex( 0 )->getData(); my $State = $contact->getElementsByTagName( "State" ); my $state = $State->item( 0 )->getChildAtIndex( 0 )->getData(); my $Phone = $contact->getElementsByTagName( "Phone" ); my $phone = $Phone->item( 0 )->getChildAtIndex( 0 )->getData(); my $Lat = $contact->getElementsByTagName( "Latitude" ); my $lat = $Lat->item( 0 )->getChildAtIndex( 0 )->getData(); my $Long = $contact->getElementsByTagName( "Longitude" ); my $long = $Long->item( 0 )->getChildAtIndex( 0 )->getData(); $BusUrl = $contact->getElementsByTagName( "BusinessUrl" ); $bus_url = $BusUrl->item( 0 )->getChildAtIndex( 0 )->getData(); # change phone number format $phone =~ s/\(//; $phone =~ s/\)//; $phone =~ s/-//; $phone =~ s/\s+//; print( "$title $address $city $state $phone $lat $long $bus_url\ +n" ); } }
This is the input file:
<ResultSet> <Result> <Title>My Real Estate Agent</Title> <Address>74738 Jones Ave</Address> <City>Leesburg</City> <State>VA</State> <Phone>(703) 723-1113</Phone> <Latitude>84848894</Latitude> <Longitude>23232</Longitude> <BusinessUrl>http://www.mcmmcmmc.net</BusinessUrl> </Result> <Result> <Title>Johns Bakery</Title> <Address>8484399 heewis Dr</Address> <City>Sterling</City> <State>VA</State> <Phone>(703) 723-1114</Phone> <Latitude>8383883</Latitude> <Longitude>213123</Longitude> <BusinessUrl></BusinessUrl> </Result> <Result> <Title>Kevins flooring</Title> <Address>84848484 kevin drive</Address> <City>Fairfax</City> <State>VA</State> <Phone>(703) 723-1115</Phone> <Latitude>2321321</Latitude> <Longitude>123</Longitude> <BusinessUrl>http://www.nothing.com</BusinessUrl> </Result> </ResultSet>
I get an error when it tries to getData for Johns Bakery because the <BusinessUrl></BusinessUrl> is empty. Error message: Can't call method "getData" on an undefined value at parse_xml.pl line 74. These do not work:
if( $#BusUrl < 0 ) if( length $contact->getElementsByTagName( "BusinessUrl" )) if ($BusUrl) if (defined ( $BusUrl )) if (defined ( $contact->getElementsByTagName( "BusinessUrl" ))) if ($BusUrl) if ($BusUrl->item( 0 )->getChildAtIndex( 0 )->getLength)

Replies are listed 'Best First'.
Re: use XML::DOM - what to do with empty tags?
by mirod (Canon) on Jan 03, 2007 at 16:49 UTC

    When you write $bus_url = $BusUrl->item( 0 )->getChildAtIndex( 0 )->getData(); you apply a method (getData) to an undef value ($BusUrl->item( 0 )->getChildAtIndex( 0 )), hence the error. You should first check whether $BusUrl has children, throught the appropriately named hasChildNodes method.

    The easiest way would be to add a sub that would return the text of the element or undef (or the empty string if it's more convenient for you). While you're at it you could also improve that sub to make it resistent to extra whitespace in the elements: if you only look at the first child of the BusinessUrl element, chances are that one day it will be a line return, with the real data on the following line (and child). Using XPath (either by switching to XML::LibXML or by using XML::DOM::XPath (which is a shameless plug)) will give you a better chance at making your code less dependent on the formating of the XML data.

    You use the following (untested) code by calling my $bus_url= field( $contact, 'BusinessUrl'). Note that it is still not great, as extra comments in the data for example would be returned as part of the content of the element.

    sub field { my( $parent, $tag)= @_; my @elts= $parent->getElementsByTagName( $tag); return unless( @elts); my $elt= $elts[0]; return unless( $elt->hasChildNodes); return join( '', map { $_->getData } $elt->getChildNodes); }
      Yippie !!!! This works :) This works :)

      Thanks for everyone's help :)

      I would like to find good documentation with examples.

      Thanks again for your help.
Re: use XML::DOM - what to do with empty tags?
by EvanK (Chaplain) on Jan 03, 2007 at 16:23 UTC
    If it's throwing an error, then in addition to the if statements, you should perhaps trap the potential errors in an eval block:
    # declare lexical variable *outside the scope of the eval* my $BusUrl; # trap potential error eval { $BusUrl = $contact->getElementsByTagName( "BusinessUrl" ); }; # check if tag retrieval failed if($@) { # assign an empty (but defined) string $BusUrl = ''; } $bus_url = $BusUrl->item( 0 )->getChildAtIndex( 0 )->getData();

    __________
    The trouble with having an open mind, of course, is that people will insist on coming along and trying to put things in it.
    - Terry Pratchett

      Thanks, That makes a lot of sense. I am still getting the error:
      Can't call method "getData" on an undefined value at parse_xml.pl line + 87.
      New code:
      my $Long = $contact->getElementsByTagName( "Longitude" ); my $long = $Long->item(0)->getChildAtIndex(0)->getData(); # declare lexical variable *outside the scope of the eval* my $BusUrl; # trap potential error eval { $BusUrl = $contact->getElementsByTagName( "BusinessUrl" ); }; # check if tag retrieval failed if($@) { # assign an empty (but defined) string #$BusUrl = ''; } $bus_url = $BusUrl->item(0)->getChildAtIndex(0)->getData(); # line 87 # change phone number format
        ah, the example code I provided just assigns an empty string to $BusUrl, which you later use as an object (my bad). so really, what you'd want to do is something like this:
        if($@) { # assign an empty (but defined) string to $bus_url $bus_url = ''; } else { # assign tag contents to $bus_url $bus_url = $BusUrl->item(0)->getChildAtIndex(0)->getData(); }
        what you really want to do is follow mirod's advice and use the hasChildNodes method (note that he uses a much more flexible implementation than the following):
        my $BusUrl = $contact->getElementsByTagName( "BusinessUrl" ); # check if tag is defined and has children if (defined $BusUrl && $BusUrl->hasChildNodes) { $bus_url = $BusUrl->item( 0 )->getChildAtIndex( 0 )->getData(); } # otherwise assign an empty string else { $bus_url = ''; }

        __________
        The trouble with having an open mind, of course, is that people will insist on coming along and trying to put things in it.
        - Terry Pratchett

Re: use XML::DOM - what to do with empty tags?
by shigetsu (Hermit) on Jan 03, 2007 at 16:42 UTC

    Have you verified that $document contains what you expect it to contain?

    The XML::DOM docs state:

    The parsefile() method now also supports URLs, e.g. http://www.erols.com/enno/xsa.xml. It uses LWP to download the file and then calls parse() on the resulting string.

    So we must conclude, that parse() does not do the same thing as parsefile().
    Have you tried using parsefile() yet?

    You could also fire up the debugger (Using the Perl Debugger) and see where it goes wrong.

    What mirod said.
      Yes, the doc works great until I hit an empty tag. I know I will get empty tags so I want to prepare for it :)