Perobl has asked for the wisdom of the Perl Monks concerning the following question:

I am trying to parse XML weather data from NOAA. Everything has been working great until today.

I'm using XML::Simple

The problems resides here:

my $weatherIconURL = $data->{'data'}->{'parameters'}->{'conditions-icon'}->{'icon-link'}->[0];

Normally the XML for 'icon-link' is formatted like this:

<name>Conditions Icons</name> <icon-link>http://www.nws.noaa.gov/weather/images/fcicons/sct. +jpg</icon-link> <icon-link>http://www.nws.noaa.gov/weather/images/fcicons/nra3 +0.jpg</icon-link> </conditions-icon>

My program simple grabs the first URL and this is used to get NOAA's weather graphic.

However, I noticed something peculiar today. The XML for some locations is formatted like this:

<name>Conditions Icons</name> <icon-link xsi:nil="true"/> <icon-link>http://www.nws.noaa.gov/weather/images/fcicons/nra5 +0.jpg</icon-link> </conditions-icon>

The problem here is the NIL part of the XML:

<icon-link xsi:nil="true"/>

When this is encountered in the XML, my program responds with the following error:

A URI can't be a HASH reference at /System/Library/Perl/Extras/5.8.8/LWP/Simple.pm line 113

Is there some way that I can ignore <icon-link xsi:nil="true"/> ??? I'm not sure how to get around this.

TIA!

Replies are listed 'Best First'.
Re: Problem Parsing XML with Perl
by gmargo (Hermit) on Oct 24, 2009 at 23:50 UTC

    Mega-hack method: Toss offending lines from input before parsing.

    s{<icon-link[^/>]*/>}{}gs;

    Or does this occur during the fetch itself (since you referenced LWP/Simple.pm)?
    Is there any code you can share (like basic fetch + parse)?

      Thanks for the help.

      I'll provide some additional information about my project. My Perl program fires each and every hour and ultimately generates a text file that is used by Flash action script to automatically update a custom weather widget that is part of a larger flash welcome screen on a company intranet.

      My program first connects with NOAA and then loops thru and retrieves XML weather data for multiple locations. This information is saved to individual .XML documents in my Flash source file directory. The program then creates an array from the hash pairs I tell it to collect. Finally, the array is used to create comma delineated list of data that is saved as a text file for Flash.

      The Perl program also retrieves the correct weather graphic (.JPG files) from NOAA for use in the Flash action script. I have Perl do this because of security restrictions in Flash. The images reference the URL provided in the XML data.

      My Perl code for connecting to NOAA follows:

      # call NDFgenByDay to retrieve the xml forecast data print "\n\nCalling NDFgenByDay to retrieve XML forecast data f +or ", $location[$i], " ... "; my $response = $weather->call( SOAP::Data->name($method) =>SOAP::Data->type(decimal=>$latitude[$i])->name('lati +tude'), =>SOAP::Data->type(decimal=>$longitude[$i])->name('lon +gitude'), =>SOAP::Data->type(date=>$startDate)->name('startDate' +), =>SOAP::Data->type(integer=>$numDays)->name('numDays') +, =>SOAP::Data->type(string=>$format)->name('format') ); print "done.\n";

      My Perl code to create the XML follows:

      my $xml = $filePath . 'Forecast_' . $location[$i] . '.xml'; open(OUT, '>' . $xml); print OUT $response->result; close(OUT);

      The URL is retrieved and stored as follows:

      $weatherIconURL = $data->{'data'}->{'parameters'}->{'conditions-icon'} +->{'icon-link'}->[0];

      The problem is 'icon-link' can't be found in certain situations. Though this seems to be rare. Nevertheless, if it occurs, my program isn't currently set up to handle it. It would be nice if I could simply deal with this situation in the existing code.

      Hope this helps to better explain the problem.

Re: Problem Parsing XML with Perl
by almut (Canon) on Oct 25, 2009 at 17:42 UTC
    A URI can't be a HASH reference at /System/Library/Perl/Extras/5.8.8/LWP/Simple.pm line 113

    Is there some way that I can ignore <icon-link xsi:nil="true"/>

    The error message points you to a possible way around the problem: test if ->[0] is a hashref, and if so, skip to the next entry, or skip that case entirely, or whatever you'd consider an appropriate workaround in case of missing data.

    The thing is that XML::Simple is creating an extra hashref as soon as the XML tag has an attribute (such as <icon-link foo="bar" ...).  You can always use Data::Dumper to figure out such things yourself. Consider the following:

    #!/usr/bin/perl use XML::Simple; use Data::Dumper; my $xml_parser = XML::Simple->new(); for my $icon_link_xml ( '<icon-link>http://www.nws.noaa.gov/weather/images/fcicons/sct.jpg +</icon-link>', '<icon-link xsi:nil="true"/>', '<icon-link foo="bar"/>', '<icon-link foo="bar">http://www.nws.noaa.gov/weather/images/fcico +ns/sct.jpg</icon-link>', ) { my $xml = <<"EOXML"; <?xml version='1.0' ?> <dwml xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"> <data> <parameters> <conditions-icon> <name>Conditions Icons</name> $icon_link_xml <icon-link>http://www.nws.noaa.gov/weather/images/fcicons/nra3 +0.jpg</icon-link> </conditions-icon> </parameters> </data> </dwml> EOXML # print "$xml\n"; my $data = $xml_parser->XMLin($xml); # print Dumper $data; my $icon_link = $data->{'data'}{'parameters'}{'conditions-icon'}{' +icon-link'}; print Dumper $icon_link; # use the next entry, for example my $idx = ref($icon_link->[0]) eq "HASH" ? 1 : 0; my $weatherIconURL = $icon_link->[$idx]; # or skip entirely #next if ref($icon_link->[0]) eq "HASH"; print "=> \$weatherIconURL: $weatherIconURL\n\n"; } __END__ $VAR1 = [ 'http://www.nws.noaa.gov/weather/images/fcicons/sct.jpg', 'http://www.nws.noaa.gov/weather/images/fcicons/nra30.jpg' ]; => $weatherIconURL: http://www.nws.noaa.gov/weather/images/fcicons/sct +.jpg $VAR1 = [ { 'xsi:nil' => 'true' }, 'http://www.nws.noaa.gov/weather/images/fcicons/nra30.jpg' ]; => $weatherIconURL: http://www.nws.noaa.gov/weather/images/fcicons/nra +30.jpg $VAR1 = [ { 'foo' => 'bar' }, 'http://www.nws.noaa.gov/weather/images/fcicons/nra30.jpg' ]; => $weatherIconURL: http://www.nws.noaa.gov/weather/images/fcicons/nra +30.jpg $VAR1 = [ { 'content' => 'http://www.nws.noaa.gov/weather/images/fcico +ns/sct.jpg', 'foo' => 'bar' }, 'http://www.nws.noaa.gov/weather/images/fcicons/nra30.jpg' ]; => $weatherIconURL: http://www.nws.noaa.gov/weather/images/fcicons/nra +30.jpg

    Alternatively, you could set ForceContent => 1, and then use

    $weatherIconURL = $data->{'data'}{'parameters'}{'conditions-icon'} +{'icon-link'}[0]{content};

    In that case, the URL would simply be undefined (for <icon-link xsi:nil="true"/>) — which you'd have to check for as well, of course, to prevent further steps from erroring out when trying to fetch it...

Re: Problem Parsing XML with Perl
by Anonymous Monk on Oct 25, 2009 at 01:40 UTC
    # my $weatherIconURL = $data->{'data'}->{'parameters'}->{'conditions-i +con'}->{'icon-link'}->[0]; my $weatherIconURL = $data->{'data'}->{'parameters'}->{'conditions-ico +n'}->{'icon-link'}->[-1];

      Thanks for the suggestion.

      It looks like the [-1] forces retrieval of an earlier URL. I tried this, but this morning the problem has vanished from NOAA's XML data so I can't verify whether or not this will solve the problem.

      I thought I could maybe do the following:

      if (exists $data->{'data'}->{'parameters'}->{'conditions-icon'}->{'ico +n-link'}->[0]) { $weatherIconURL = $data->{'data'}->{'parameters'}->{'condi +tions-icon'}->{'icon-link'}->[0]; print "Weather Icon: ", $data->{'data'}->{'parameters'}->{ +'conditions-icon'}->{'icon-link'}->[0], "\n"; } else { $weatherIconURL = $data->{'data'}->{'parameters'}->{'condi +tions-icon'}->{'icon-link'}->[-1]; print "Weather Icon: ", $data->{'data'}->{'parameters'}->{ +'conditions-icon'}->{'icon-link'}->[-1], "\n"; }

      Do you think this will correctly handle the NIL exceptions?