While you could use normal nested Perl data structures to deal with this, XML is becoming en vogue and as a result we have to be just as fashionable. Actually, this isn't true, we can always use Gisle Aas' Data::XMLDumper to convert XML to-and-for Perl nested data structures. But for the purpose of this tutorial, we will act like that module doesn't exist.
So without further adieu, I present the XML document detailing the (far too windy) part of the world I currently live in (and will be escaping from as soon as Christmas is here):
<border_list> <pair><city>mountain view</city><city>sunnyvale</city></pair> <pair><city>mountain view</city><city>palo alto</city></pair> <pair><city>menlo park</city><city>palo alto</city></pair> <pair><city>atherton</city><city>menlo park</city></pair> <pair><city>atherton</city><city>redwood city</city></pair> <pair><city>san carlos</city><city>redwood city</city></pair> <pair><city>san carlos</city><city>belmont</city></pair> <pair><city>hillesdale</city><city>belmont</city></pair> <pair><city>hillesdale</city><city>san mateo</city></pair> </border_list>
Ok, so now what
So, now that I have shown the data, it is time to grok it, munge it, eat it for breakfast as a meal replacement and basically put it at it's knees to do our bidding.Program One: find all cities next to menlo park
Ok, here is a program to grok this XML-base for all cities next menlo park:and here is the pretty output:use XML::Twig; my $t = XML::Twig->new(PrettyPrint => 'record'); $t->parsefile('adj.xml'); my $root = $t->root; # @pair has all the pairs of adjacent cities in it my @pair = $root->children; # target city we are looking for my $city = 'menlo park'; # this routine takes a search text and a list of XML elements and # searches them for the text sub candidate_generator { my ($search_text, @data) = @_; grep { grep { $_->text eq $search_text } $_->children } @data; } # take the entire XML-base and search for records which have our # target city in them my @adj = candidate_generator($city,@pair); # print them out in a human-readable form map { $_->print } @adj;
<pair> <city>menlo park</city> <city>palo alto</city> </pair> <pair> <city>atherton</city> <city>menlo park</city> </pair>
all done
The program was documented, so it should make sense, but let's take a closer look at candidate_generator().It consists of two nested greps and hence can be a little confusing. Depending on the way you think you might want to think about the outer grep and then the inner grep or vice versa. It is only fitting that I discuss both methods of program comprehension.# this routine takes a search text and a list of XML elements and # searches them for the text sub candidate_generator { my ($search_text, @data) = @_; grep { grep { $_->text eq $search_text } $_->children } @data; }
Let's do top-down first. The outer grep is basically saying: take all the XML records and only return the ones which satisfy the inner search criteria. The inner search criteria takes each individual XML record and looks at each of it's children, where each child is a city and examines its text for equality with the text to be searched for, or concretely speaking menlo park.
Ok, now bottom up. The innermost expression is $_->text eq $search_text and what this does is take an XML element and get its text and compare it to a normal Perl string. So if $elt was an XML::Twig::Elt representing
then $elt->text would be boise. Now we work out a bit more. And a bit more out is grep { YADAYADA } $_->children So here we take advantage of the fact that the XML is structured so that neighboring cities are both children of the pair element, e..g:<city>boise</city>
and we are just checking to see if either child is the text we are looking for. And now we finally make it to the outer grep and the first sentence in the top-down description says what that is doing.<pair><city>mountain view</city><city>sunnyvale</city></pair>