st4k has asked for the wisdom of the Perl Monks concerning the following question:

Hello everyone! I have a quick question about using XML::Simple to parse some data from the DOT to get traffic reports and such. I have an xml file that lookst like this:
$VAR1 = { 'incident-report' => [ { 'incident' => { 'GDOT-INC-260089' => { 'location' => { 'link' => { 'direction' => 'South', 'hwy' => '2', 'content' => 'Southbound I-75 at DELK ROAD', 'name' => 'I-75', 'id' => '2078', 'mile-post' => '261.66' }, 'coord' => { 'y' => '435177', 'unit' => 'meters', 'x' => '670359', 'datum' => 'NAD83', 'projection' => 'GA State Plane West' }, 'county' => { 'content' => 'Cobb', 'id' => '67' }, 'type' => { 'content' => 'Freeway', 'id' => '1' }, 'description' => 'Southbound I-75 at Delk Road' }, 'status' => 'active', 'description' => 'Medium impact, 1 tractor-trailer, Cobb Co.', 'lanes' => '2 Right Lanes', 'level' => '3', 'type' => { 'content' => 'Accident', 'id' => '1' }, 'cleanup' => { 'timestamp' => '2003-06-19 17:20:00 EDT', 'content' => ' 5:20 PM Today' }, 'impact' => { 'content' => 'Medium', 'id' => '2' } },
and basically I am having trouble forcing the incident into an array so I can call it like this:
print $ref->{incident}->[0]->{location}->{county}->{content} . "\n";
It gives me the error about not being a valid array. Can anyone suggest something that might work to get that 'GDOT-INC-260089' part into an array so that I can loop through them? I want to share the code for this when I get it finished. Maybe someone in Georgia will be able to use it ;-)

2003-06-21 edit ybiC: two-space indents, quasi-vertical alignment of comma-arrows for legibility

Replies are listed 'Best First'.
Re: XML::Simple parsing :-(
by chip (Curate) on Jun 19, 2003 at 21:33 UTC
    I'll be glad to help, but you should print your data with less indentation so it doesn't scroll off to the right. Printing it as YAML instead of Data::Dumper should make it a lot clearer, too, besides cutting down on the ultra-indent.

    update: Oh, and you should include the original XML and the code you wrote that reads it with XML::Simple. That's quite important, you know. We don't know what you did wrong until we know what you did.

        -- Chip Salzenberg, Free-Floating Agent of Chaos

      Print the structure in YAML for debugging? IMHO, I don't think so.

      The indentation problem of Data::Dumper is fixed with $Data::Dumper::Indent = 1; - not by using some other format that I'll bet most of us don't know. (After all, we know how to read perl hashes!) I don't know YAML, and given that XML is now Lingua Franca, I doubt I'll be learning it.

      (And how would he produce the YAML output, assuming the modules exist, when his problem is that he can't get the XML to parse the way he wants? Another module to learn?)

      --Bob Niederman, http://bob-n.com

        For what it's worth: YAML output looks very much like Data::Dumper output, but it is a lot more compact. In other words: you don't really need to "learn" YAML to read its output. I consider that it just makes more sense in most case to use YAML than to use Data::Dumper for debugging.

        BTW, learning enough about YAML to be able to use it for debugging would probably have taken you less time that writing this post (and wouldn't have resulted in --'s ;--).

Re: XML::Simple parsing :-(
by st4k (Novice) on Jun 19, 2003 at 21:40 UTC
    Sorry about that. I cant find a way to edit my original post though sorry :-/ Heres my original code though that reads in the xml file.
    #!/usr/bin/perl -w use strict; use XML::Simple; use Data::Dumper; my $doc; open(FH, "inc.xml") || die "Can't open inc.xml"; sysread(FH, $doc, -s FH); close FH; my $xs = new XML::Simple; # can use forcearray and keeproot my $ref = $xs->XMLin("$doc"); # works print $ref->{incident}->{'GDOT-INC-260089'}->{location}->{county}->{co +ntent} . "\n"; # doesnt work print $ref->{'incident'}->[0];
      Based on this part of your post:
      # works print $ref->{incident}->{'GDOT-INC-260089'}->{location}->{county}->{co +ntent} . "\n"; # doesnt work print $ref->{'incident'}->[0];
      and this initial comment:

      basically I am having trouble forcing the incident into an array so I can call it like this:

      print $ref->{incident}->[0]->{location}->{county}->{content} . "\n";

      Why is it important to have the structure be set up as $ref->{incident}->[]->... ? Is it simply that you want to be able to access the incident reports via numeric array index rather than by name?

      Either there is some compelling reason (which you haven't explained yet) why you want a numeric array index at that particular level of the structure, or else you're asking for something that you don't really need.

      If the incident-id values are being created in such a way that they can easily be sorted into the most relevant order (chronological or whatever), then you can iterate over the hash keys in that desired order by sorting them into an array first:

      my @sorted_keys = sort keys %{$ref->{incident}}; for my $incid ( @sorted_keys ) { # do something with $$ref{incident}{incid} ... }
      If necessary, you could use a more elaborate sort function, including one that references the contents of lower layers in the structure (such as date/time, location or whatever), and this might be a good place to apply a Schwartzian Transform.
      and here is the original xml:
      <?xml version="1.0" encoding="UTF-8"?> <!-- Incident report from the Georgia DOT's Navigator ITS. --> <!-- Visit us online at http://georgia-navigator.com/. --> <!-- Raw data available at http://georgia-navigator.com/data/. --> <!-- Copyright (c) 2002 Georgia DOT. All rights reserved. --> <incident-report timestamp="2003-06-19 15:56:03 EDT"> <incident id="GDOT-INC-252421" level="3" status="active"> <type id="8">Incident</type> <description>High impact, Roadway Damage, Pickens Co.</description +> <impact id="3">High</impact> <cleanup timestamp="2003-06-20 00:35:00 EDT">12:35 AM Tomorrow</cl +eanup> <lanes>All Lanes</lanes> <location> <type id="6">Arterial</type> <county id="227">Pickens</county> <coord projection="GA State Plane West" datum="NAD83" unit="mete +rs" x="670707" y="494686" /> <description>Southbound SR 53 AT MARBLE HILL(SINKHOLE -MP 25.3) +(PICKENS CO)</description> </location> </incident>
        Please read the documentation on XML::Simple, hm? Save us all a lot of time. It's a good module if you know how to use it.

        According to that documentation, you probably want the ForceArray option:

            my $ref = $xs->XMLin($doc, ForceArray => [ 'incident' ]);

        PS: You don't need quotes on $doc when you're passing it to the subroutine.

            -- Chip Salzenberg, Free-Floating Agent of Chaos